We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Aurora, IllinoiS

PROFESSIONAL SUMMARY:

  • Over all 8+ years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
  • In depth knowledge in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming .
  • Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's.
  • Expertise in Spark Architecture including Spark Core, Spark SQL , Data Frames, Spark Streaming and Spark MLlib.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala .
  • Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
  • Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Experience with all flavor of Hadoop distributions, including Cloudera, Hortonworks, Mapr and Apache.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (5.X) distributions and on Amazon web services (AWS).
  • Expertise in implementing SparkScala application using higher order functions for both batch and interactive analysis requirement.
  • Extensive experienced working with Spark tools like RDD transformations, spark MLlib and spark QL.
  • Hands on experience in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Experienced in working with structured data using HiveQL , join operations, Hive UDFs , partitions , bucketing and internal / external tables.
  • Extensive experience in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
  • Good Experience with NoSQL Databases like HBase, MongoDB and Cassandra.
  • Experience on using Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
  • Hands on experience in querying and analyzing data from Cassandra for quick searching, sorting and grouping through CQL.
  • Experience working with MongoDB for distributed storage and processing.
  • Good knowledge and experienced in Extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Worked on importing data into HBase using HBase Shell and HBase Client API .
  • Experience in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Good knowledge in working with scheduling jobs in Hadoop using FIFO , Fair scheduler and Capacity scheduler.
  • Experienced in designing both time driven and data driven automated workflows using Oozie and Zookeeper .
  • Experience working on Solr for developing search engine on unstructured data in HDFS.
  • Extensively used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandra key spaces.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL .
  • Experience in Extraction, Transformation and Loading ( ETL ) of data from multiple sources like Flat files, XML files, and Databases .
  • Supported various reporting teams and experience with data visualization tool Tableau.
  • Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and
  • ETL Tools like IBM DataStage, Informatica and Talend.
  • Experienced and in-depth knowledge of cloud integration with AWS using Elastic Map Reduce ( EMR ), Simple Storage Service ( S3 ), EC2 , Redshift and Microsoft Azure .
  • Detailed understanding of Software Development Life Cycle ( SDLC ) and strong knowledge in project implementation methodologies like Waterfall and Agile .

TECHNICAL SKILLS:

Languages: C, C++, Python, R, PL/SQL, Java, HiveQL, Pig Latin, Scala, UNIX shell scripting.

Hadoop Ecosystem: HDFS, YARN, Scala, Map Reduce, Hive, Pig, Zookeeper, Sqoop, Oozie, Bedrock, Flume, Kafka, Impala, NiFi, MongoDB, HBase.

Databases: Oracle, MS: SQL Server, MySQL, PostgreSQL, NoSQL (HBase, Cassandra, MongoDB), Teradata.

Tools: Eclipse, NetBeans, Informatica, IBM DataStage, Talend, Maven, Jenkins.

Hadoop Platforms: Hortonworks, Cloudera, Azure, Amazon Web services (AWS).

Operating Systems: Windows XP/2000/NT, Linux, UNIX.

Amazon Web Services: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.

Version Control: GitHub, SVN, CVS.

Packages: MS Office Suite, MS Vision, MS Project Professional.

PROFESSIONAL EXPERIENCE:

Confidential, Aurora, Illinois

Sr. Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming which gets the data from different learning systems of the patients.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Used SparkStreaming to divide streaming data into batches as an input to Sparkengine for batch processing.
  • Evaluated the performance of Apache Spark in analyzing genomic data.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Created POC using SparkSQL and MLlib libraries.
  • Experienced in managing and reviewing Hadoop log files.
  • Worked closely with EC2 infrastructure teams to troubleshoot complex issues.
  • Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
  • Involved in installing EMR clusters on AWS.
  • Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Apply Transformation rules on the top of DataFrames.
  • Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing
  • Developed HiveUDFs and UDAF’s for rating aggregation.
  • Developed java client API for CRUD and analytical Operations by building a restful server and exposing data from No-SQL databases like Cassandra via rest protocol.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Experience in managing and reviewing Hadoop Log files.
  • Worked extensively with Sqoop to move data from DB2 and Teradata to HDFS.
  • Collected the logs data from web servers and integrated in to HDFS using Kafka.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Impala.
  • Worked on various performance optimizations like using distributed cache for small datasets, partition, bucketing in hive, map side joins etc.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Used Cassandra (CQL) with Java API's to retrieve data from Cassandra tables.
  • Worked on analyzing and examining customer behavioral data using Cassandra.
  • Worked on Solr configuration and customizations based on requirements.
  • Indexed documents using Apache Solr.
  • Extensively use Zookeeper as job scheduler for Spark Jobs.
  • Worked with BI teams in generating the reports on Tableau.
  • Used JIRA for bug tracking and CVS for version control.

Environment: Hadoop, MapReduce, HDFS, PIG, Hive, Sqoop, Oozie, Storm, Kafka, Spark, Spark Streaming, Scala, Cassandra, Cloudera, ZooKeeper, AWS, Solr, MySQL, Shell Scripting, Java, Tableau.

Confidential, Austin, Texas

Sr. Hadoop Developer

Responsibilities:

  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and partitions.
  • Applied Kafka custom encoders for custom input format to load data into Kafka Partitions.
  • Implement POC with Hadoop. Extract data with Spark into HDFS.
  • Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed code to read data stream from Kafka and send it to respective bolts through respective stream.
  • Worked on Spark streaming using Apache Kafka for real time data processing.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming.
  • Developed Map Reduce jobs using Map Reduce Java API and HIVEQL.
  • Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
  • Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie.
  • Experienced in using Avro data serialization system to handle Avro data files in map reduce programs.
  • Experienced in optimizing Hive queries, joins to handle different data sets.
  • Configured Oozie schedulers to handle different Hadoop actions on timely basis.
  • Involved in ETL, Data Integration and Migration by writing pig scripts.
  • Used different file formats like Text files, Sequence Files, Avro using Hive SerDe's.
  • Integrated Hadoop with Solr and implement search algorithms.
  • Experience in Storm for handling realtime processing.
  • Hands on Experience working in Hortonworks distribution.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
  • Designed and implemented MongoDB and associated RESTful web service.
  • Worked on analyzing and examining customer behavioral data using MongoDB.
  • Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
  • Involved in writing test cases and implement test classes using MRUnit and mocking frameworks.
  • Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
  • Setup Spark EMR to process huge data which is stored in Amazon S3.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • UsedTalend tool to create workflows for processing data from multiple source systems.

Environment: MapReduce, HDFS, Sqoop, LINUX, Oozie, Hadoop, Pig, Hive, Solr, Spark Streaming, Kafka, Storm, Spark, Scala, Python, MongoDB, Hadoop Cluster, Amazon Web Services, Talend.

Confidential, CA

Hadoop developer

Responsibilities:

  • Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
  • Developed a Wrapper Script around Teradata connector for Hadoop TCD to support option parameters.
  • Used Sqoop extensively to ingest data from various source systems into HDFS.
  • Hive was used to produce results quickly based on the report that was requested.
  • Played a major role in working with the team to leverage Sqoop for extracting data from Teradata.
  • Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
  • I ntegrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
  • Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
  • Involved in Hive-Hbase integration by creating hive external tables and specifying storage as Hbase format.
  • Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
  • Developed PIG UDFs for the needed functionality such as custom Pigsloader known as timestamp loader.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
  • Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
  • Involved in moving log files generated from various sources to HDFS for further processing through Flume.
  • Worked on different file formats like Text files, Parquet, Sequence Files, Avro , Record columnar files (RC).
  • Developed several shell scripts, which acts as wrapper to start theseHadoop jobs and set the configuration parameters.
  • Kerberos security was implemented to safeguard the cluster.
  • Worked on a stand-alone as well as a distributed Hadoop application.
  • Tested the performance of the data sets on various NoSQL databases.
  • Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.

Environment: Hadoop, HDFS, Pig,Flume, Hive, MapReduce, Sqoop, Oozie, Zookeeper, HBase, Java Eclipse, SQL Server, Shell Scripting.

Confidential, Minneapolis, MN

Hadoop/Java developer

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Migrated existing SQL queries to HiveQL queries to move to big data analytical platform.
  • Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data.
  • Installed and configured Cassandra DSE multi-node, multi-data center cluster.
  • Designed and implemented a 24 node Cassandra cluster for single point inventory application.
  • Analyzed the performance of Cassandra cluster using nodetool TP stats and CFstats for thread analysis and latency analysis.
  • Implemented Real time analytics on Cassandra data using thrift API.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Load and transform large sets data into HDFS using Hadoop fs commands.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
  • Implemented UDFS, UDAFS in java and python for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
  • Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
  • Designed the logical and physical data modeling wrote DML scripts for Oracle 9i database.
  • Used Hibernate ORM framework with Spring framework for data persistence.
  • Wrote test cases in JUnitfor unit testing of classes.
  • Involved in templates and screens in HTML and JavaScript.

Environment: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive.

We'd love your feedback!