We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Cambridge, MA

SUMMARY

  • A Dynamic Professional with 8+ years of experience in IT industry which includes 4+years of solid software development experience for mission critical, data - intensive applications in Big Data and Hadoop.
  • Experience in developing applications that perform large scale Distributed Data Processing using Hadoop, MapR, Pig, Hive, Sqoop, Oozie, Java, Spark, Storm, Hbase, Cassandra, Kafka, Zookeeper and Flume.
  • Excellent understanding and knowledge of Big Data and Hadoop architecture.
  • In-Depth knowledge and experience in design, development and deployments of Big Data projects using Hadoop / Data Analytics / NoSQL / Distributed Machine Learning frameworks.
  • Good understanding / knowledge of Spark architecture and various components such as Spark streaming, Spark SQL, Spark R programming paradigm.
  • Solid experience using YARN and tools like Pig and Hive for data analysis, Sqoop for data ingestion, Oozie for scheduling and Zookeeper for coordinating cluster resources.
  • Excellent understanding on CDH, HDP, Pivotal, MapR and Apache Hadoop distributions.
  • Good understanding of HDFS Designs, Daemons, HDFS high availability (HA), HDFS Federation.
  • Experience in analyzing data using Pig Latin, HiveQL, Hbase and custom MapReduce programs in Java.
  • Excellent understanding of Hadoop architecture and different demons of Hadoop clusters which include Resource Manager, Node Manager, Name Node and Data Node.
  • Experience in developing applications cloud computing applications using Amazon EC2, S3, EMR.
  • Excellent experience in working with high velocity Real-time data processing frameworks using tools like Kafka, Spark and Storm.
  • Expert in working with Hive data warehouse creating tables, data distribution by implementing Partitioning and Bucketing.
  • Expertise in implementing Ad-hoc queries using Hive QL and writing Pig Latin scripts to sort, group, join and filter the data as part of data transformation as per the business requirements.
  • Experienced with different kind of compression techniques to save data and optimize data transfer over networks using Lzo, Snappy, Gzip, Bzip2 etc.
  • Experience with NoSQL databases like Hbase, Cassandra, MongoDB and Couchbase as well as other ecosystems like ZooKeeper, Oozie, Impala, Storm, Spark Streaming, SparkSQL, Kafka and Flume.
  • Extending HIVE and PIG core functionality by using custom UDFs in Java.
  • Experience in importing and exporting data using Sqoop from HDFS (Hive & Hbase) to Relational Database Systems (Oracle, Mysql, DB2, Informix, TeraData) and vice-versa.
  • Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs using Oozie-Coordinator.
  • Good understanding on SQL database concepts and data warehouse Technologies like Talend.
  • Extensive experience in Requirements gathering, Analysis, Design, Reviews, Coding and Code Reviews, Unit and Integration Testing.
  • Extensive experience using Java, JEE, J2EE design Patterns like Singleton, Factory, MVC, Front Controller, for reusing most effective and efficient strategies.
  • Expertise in using IDE like WebSphere (WSAD), Eclipse, NetBeans, My Eclipse, WebLogic Workshop.
  • Experience in developing service components using JDBC.
  • Experience in developing and designing Web Services (SOAP and Restful Web services).
  • Good amount of experience in developing applications using SCRUM methodology.
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall.
  • Excellent communication, analytical skills and flexible to learn new technologies in the IT industry towards company’s success.

TECHNICAL SKILLS

Languages/Tools: Java, XML, XSTL, HTML/XHTML, HDML, DHTML, Python, Scala, R, GIT.

Big Data Technologies: Apache Hadoop, HDFS, Spark, HIVE, PIG, Talend, Hbase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Impala.

Java Technologies: JSE: JAVA architecture, OOPs concepts JEE:JDBC, JNDI, JSF(Java Server Faces), Spring, Hibernate, SOAP/Rest web services

Web Technologies: HTML, XML, Java Script, WSDL, Soap, JSON, angular JS

Databases/NO SQL: MS SQL Server, MySQL, Hbase, Oracle, MS Access, Teradata, oracle, Netezza, Cassandra, Greenplum, MongoDB.

PROFESSIONAL EXPERIENCE

Confidential, Cambridge, MA

Hadoop Developer

Responsibilities:

  • Maintained System integrity of all sub-components (primarily HDFS, MR, Hbase, and Hive).
  • Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Involved in creating mappings, sessions and workflows. Used all kind of Informatica transformations.
  • Developed multiple Kafka Producers and Consumers from base by using low level and high level API's and implementing.
  • Involved in moving data from Hive tables into Cassandra for real time analytics on Hive tables
  • Create, alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • HiveQL scripts to create, load, and query tables in a Hive.
  • Analyzed, developed and implemented the ETL architecture using Erwin &Informatica.
  • Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
  • Used all kinds ofInformaticatransformations to load all kinds of data and to automate the process.
  • Supported Map Reduce Programs those are running on the cluster
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
  • Implemented Spark using Scardsla and Spark SQL for faster testing and processing of data.
  • Real time streaming the data using Spark with Kafka.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
  • Developing and maintaining efficient ETL Talend jobs for Data Ingest.
  • Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
  • Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
  • Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.

Environment: Hortonworks, HDFS, Hive, HQL scripts, Scala, Map Reduce, Storm, Spark, Java, Hbase, Cassandra, Pig, Sqoop, Shell Scripts, Oozie Co-ordinator, MySQL, Tableau, Elastic search, Talend, Informatica and SFTP.

Confidential, Indianapolis, IN

Hadoop Developer

Responsibilities:

  • Coordinated with business customers to gather business requirements. And also interacted with other technical peers to derive Technical requirements.
  • Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
  • Developed MapReduce, Pig and Hive scripts to cleanse, validate and transform data.
  • Worked with performance issues and tuning the Pig and Hive scripts.
  • Orchestrated Oozie workflow engine to run multiple Hive and Pig jobs.
  • Designed and developed PIG data transformation scripts to work against unstructured data from various data points and created a base line.
  • Involved in implementing Spark RDD transformations, actions to implement business analysis.
  • Worked on creating and optimizing Hive scripts for data analysts based on the requirements.
  • Created Hive UDFs to encapsulate complex and reusable logic for the end users.
  • Experienced in working with Sequence files and compressed file formats.
  • Converted the raw files (CSV, TSV) to different file formats like Parquet and Avro with datatype conversion using cascading.
  • Developed the code for removing or replacing the error fields in the data fields using cascading.
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  • Monitored the cascading flow using the Driven component to ensure the desired result was obtained.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka
  • Developed some utility helper classes to get data from Hbase tables.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: Cloudera, Linux (CentOS, RedHat), UNIX Shell, Pig, Hive, MapReduce, YARN, Apache Spark, Eclipse, Core Java, JDK1.7, Oozie Workflows, AWS, EMR, HBASE, Cassandra, SQOOP, Scala, Kafka.

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
  • Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
  • Applied Hive quires to perform data analysis on Hbase using Storage Handler in order to meet the business requirements.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Hands on experience with NoSQL databases like Hbase.
  • Developing Scripts and Batch Job to schedule abundle (group of coordinators) which consists of various Hadoop Programs using Oozie
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in ETL, Data Integration and Migration.
  • Developing custom aggregate functions using Spark SQL and performed interactive querying.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts
  • Installed and configured Hadoop, Mapreduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Worked on ETL Informatica for parsing the data, and then the parsed data is loaded to HDFS.
  • Responsible for building Scalable distributed data solutions using Hadoop.
  • Assisted in setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line.
  • Developed scripts for tracking the changes in file permissions of the files and directories through audit logs in HDFS.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in loading data from UNIX file system to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Balancing HDFS manually to decrease network utilization and increase job performance.
  • Set up automated processes to archive/clean the unwanted data on the cluster, in particular on HDFS and Local file system.

Environment: Cloudera, HDFS, Hive, Pig, Sqoop, LINUX, Hbase, Tableau, Informatica, Micro strategy, Shell Scripting, Ubuntu, RedHat Linux.

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Developed data pipeline using Flume, Sqoop, Pig and Map Reduce to ingest customer behavioural data and purchase histories into HDFS for analysis.
  • Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Loaded the aggregated data onto DB2 for reporting on the dashboard.
  • Monitoring and Debugging Hadoop jobs/Applications running in production.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Building, packaging and deploying the code to the Hadoop servers.
  • Moving data from Oracle to HDFS and vice-versa using SQOOP
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Worked with different file formats and compression techniques to determine standards
  • Developed Hive scripts for implementing control tables logic in HDFS.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE
  • Created Hbase tables to store various data formats of data coming from different portfolios and data processing using SPARK.
  • Cluster co-ordination services through ZooKeeper.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Hands on writing Map Reduce code to make unstructured data as structured data and for inserting data into HBase from HDFS.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Environment: JDK, Ubuntu Linux, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, MongoDBZookeeper, Hbase, Java, Shell Scripting, Informatica, Cognos, SQL, Teradata.

Confidential

Java/J2EE Developer

Responsibilities:

  • Analysis of system requirements and development of design documents.
  • Development of Spring Services.
  • Development of persistence classes using Hibernate framework.
  • Development of SOA services using Apache Axis webservice framework.
  • Development of user interface using Apache Struts2.0, JSPs, Servlets, JQuery, HTML and Java Script.
  • Developed client functionality using ExtJS.
  • Development of JUnit test cases to test business components.
  • Extensively used Java Collection API to improve application quality and performance.
  • Vastly used Java 5 features like Generics, enhanced for loop, type safe etc.
  • Providing production support and enhancements design to the existing product.

Environment: s: Java 1.5, SOA, Spring, ExtJS, Struts 2.0, Servlets, JSP, GWT, JQuery, JavaScript, CSS, Web Services, XML, Oracle, Weblogic Application Server, Eclipse, UML, Microsoft Vision.

Confidential 

Java Developer

Responsibilities:

  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Involving in creation object model to relational using Hibernate.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Consumed Web Services for transferring data between different applications.
  • Wrote complex SQL and stored procedures.
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Developed user and technical documentation.

Environment: Java, SQL, Servlets, HTML, XML, Hibernate, JavaScript, spring, Hibernate.

We'd love your feedback!