Sr.hadoop Developer Resume
Hartford, CT
SUMMARY:
- 8+ Years of professional experience in IT which includes more than 4 years of work experience in Big Data, Hadoop ecosystem related technologies in Banking and Healthcare sectors.
- Experience with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Yarn, Oozie, and Zookeeper.
- Experience in Spark Streaming to receive real time data and store the stream data into HDFS.
- Experience with common Big Data technologies such as Cassandra, Hadoop, HBase, MongoDB, Cassandra and Impala.
- Experience in developing NoSQL database by using CRUD, Sharding, Indexing and Replication.
- Knowledge of processing and analyzing real - time data streams/flows using Kafka and HBase.
- Assist Pig and Hive developers to optimize their queries, strategize on design and implementation approach, and guide the team’s efforts to proven best practices.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs and Hive jobs.
- Experience in importing and exporting the data using Sqoop and Flume from HDFS to Relational Database System and vice-versa.
- Used Zookeeper to provide coordination services to the cluster.
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Good understanding of cloud configuration in Amazon web services (AWS).
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Experience working in Oracle, DB2, and My SQL databases.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of Core Java design patterns.
- Involved in writing shell scripts in scheduling and automation of tasks.
- Good Knowledge on Spark Machine Learning Library and thorough understanding of various types of algorithms.
- Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.
TECHNICAL SKILLS:
Big Data/Hadoop: HDFS, Hadoop MapReduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kakfa.
Languages: Scala, J2SE, J2EE, SQL/PLSQL, R, and Python.
Methodologies: Agile, V-model, Waterfall model
Database: HBase, MongoDB, Cassandra, SQL Server, Oracle.
Web Tools/Frameworks: HTML, XML, JDBC, JSON, JSP, and Tableau
Scripting: Bash, Shell, Python and JQuery
PROFESSIONAL EXPERIENCE:
Sr.Hadoop Developer
Confidential, Hartford, CT
Responsibilities:
- Developed state of the art API for Data Ingestion that would move date from RDBMS to Hadoop and vice-versa.
- Used Spark Scala and Java MapReduce API to develop the above API.
- Involved in performance tuning of spark applications for fixing right batch interval time and memory tuning.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Utilized Apache Phoenix to hold all the metadata of the processes and for memory management of MR and Spark Jobs.
- Developed business specific Custom UDF's in Hive, Pig.
- Worked in loading and transforming large sets of structured, semi-structured and unstructured data.
- Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Automated shell scripts (for calling Java and Scala programs) using AutoSys.
- Performed Data Quality checks like flagging duplicates, null value checks, etc., for the data in Hadoop using Spark Scala API.
- Utilized Avro serialization for performance enhancement.
Environment: Apache Spark, Hadoop, Java, Scala, Apache Phoenix, Hive, Pig, Avro, Sqoop, UNIX, YARN, Oozie, Hortonworks, Eclipse, SVN and GitHub.
Hadoop Developer
Confidential, St. Louis, MO
Responsibilities:
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Sparkframework.
- Worked on Kafka for message queuing solutions.
- Implemented a POC with Spark SQL to interpret complex JSON records.
- Created table definition and made the contents available as a Schema-BackedRDD.
- Developed business specific Custom UDF's in Hive, Pig.
- Optimizing MapReduce code, pig scripts and performance tuning and analysis.
- Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
- Responsible for developing Pig Latin scripts and HQL.
- Worked with Tableau and Hive Integration using Kerberos authentication.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Experience in managing and reviewing Hadoop Log files.
- Used Zookeeper to provide coordination services to the cluster.
- Experienced in loading and transforming large sets of structured, semi-structured and unstructured data.
- Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
Environment: Apache Hadoop, Java, Cassandra, Hive, Pig, HQL, Sqoop, UNIX, Tableau, Apache MRUNIT,SparkSQL, Spark Streaming, Kafka, Flume, YARN, Oozie and Apache Zookeeper.
Hadoop Developer
Confidential, Columbia, MD
Responsibilities:
- Involved in Writing Data Refinement Pig Scripts and Hive Queries (HQL).
- Data Transformation using tools like Pig and MapReduce.
- Developed Pig UDF’s in java for custom data for various levels of optimization.
- Data Analysis using Hive and developed UDFs using Java for Hive.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate severaltypes of Hadoop jobs.
- Shell scripts to dump the data from MySQL to HDFS.
- Analyzing of large volumes of structured data using SparkSQL.
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Sparkframework.
- Worked on Maven 3.3.9 for building and managing Java based projects.
- Hands-on experience with using Linux and HDFS shell commands.
- Worked on Kafka for message queuing solutions.
- Developing Unit Test Cases for Mapper, Reducer and Driver classes using MRUNIT.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats liketext, zip, XML and JSON.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase andCassandra.
- Written HBASE Client program in Java and web services.
Environment: Apache Hadoop, Java,HBase, Cassandra, Hive, Pig, HQL, Sqoop, UNIX, Eclipse, Apache MRUNIT,SparkSQL, Spark Streaming, Kafka, Flume, YARN, Oozie and Apache Zookeeper.
Hadoop Developer
Confidential, East Hanover, NJ
Responsibilities:
- Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
- Experienced in defining and coordination of job flows.
- Gained experience in reviewing and managing Hadoop log files.
- Extracted files from NoSQL database like CouchDB, HBase through sqoop and placed in HDFS for processing.
- Involved in Writing Data Refinement Pig Scripts and Hive Queries
- Good knowledge in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Coordinated cluster services using ZooKeeper.
- Used Flume to transport logs to HDFS
- Experienced in moving data from Hive tables into Cassandra for real time analytics on hive tables.
- Configured connection between HDFS and Tableau using Impala for Tableau developer team.
- Responsible to manage data coming from different sources.
- Got good experience with various NoSQL databases.
- Experienced with handling administration activations using Cloudera manager.
- Supported MapReduce programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing Hive queries (HQL) which will run internally in map reduce way.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like, Pig, Hive, and Sqoop) as well as system specific jobs (such as shell script).
- Automated all the jobs, for pulling netflow data from relational databases to load data into Hive tables, using Oozie workflows and enabled email alerts on any failure cases.
Environment: Apache Hadoop, Java, JDK1.6, Linux, HBase, Hive, Pig, HQL, Sqoop, Flume, ZooKeeper, NoSQL, R, Map-Reduce, Cloudera, HDFS, Flume, Impala, Tableau, MySQL, MongoDB.
Hadoop Developer
Confidential, Stamford, CT
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in writing MapReduce jobs.
- Involved in Sqoop, HDFS Put or Copy from Local to ingest data.
- Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
- Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS.
- Proficient work experience with NOSQL, MongoDB databases.
- Extracted and updated the data into MongoDB using Mongo import and export command line utility interface.
- Used Eclipse and ant to build the application.
- Involved in using Sqoop for importing and exporting data into HDFS.
- Involved in processing ingested raw data using MapReduce, Apache Pig and Hive.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
Environment: Hadoop, MapReduce, MongoDB, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java, Cloudera, HDFS, Eclipse
Java/ Hadoop Developer
Confidential
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Installed and configured Hadoop MapReduce and HDFS.
- Acquired good understanding and experience of NoSQL databases such as HBase and Cassandra.
- Installed and configured Hive and also implemented various business requirements by writing Hive UDFs.
- Created complex SQL queries and stored procedures.
- Responsible for managing data coming from different sources.
- Provided technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Responsible to manage data coming from different sources.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
- Installed Oozie workflow engine to run multiple MapReduce, Hive and Pig jobs which run independently with time and data availability.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Environment: Hadoop, HBase, Cassandra, Hive, HiveQL, Pig, Pig LatinNoSQL, Cassandra, Oozie, Sqoop, Java, Eclipse, SQL, ANT 1.6, Python.
Java Developer
Confidential
Responsibilities:
- Extensively worked in acquiring the requirements from the business analysts and involved in all requirement clarification calls.
- Understanding the design documents.
- Involved in Detail level design and coding activities at offshore.
- Involved in Code review.
- Writing and testing the JUNIT test classes.
- Provide support to client applications in production and other environments.
- Working on tickets raised by the real time users and continuous interaction with end users.
- Prepared the Technical Design Document, understanding document and test cases (UTCs and ITCs).
- Provided Technical & Functional support to the end users during UAT & Production.
- Continuous monitoring of application for 100% availability.
Environment: Java, Spring MVC, CVS, AQT, WebSphere and Oracle 10g.
