We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

0/5 (Submit Your Rating)

IL

PROFESSIONAL SUMMARY:

  • Over 8 years of IT industry experience in product Development, Implementation and Maintenance of various cloud - based web applications using Java, J2EE technologies and Big Data ecosystems on Linux environment
  • Over 4 years of experience working with analytics using Big Data technologies. Have hands-on experience in Storing, Querying, Processing and Data Analysis
  • Comprehensive work experience in implementing Big Data projects using ApacheHadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie
  • Experience withdistributed systems, large-scale non-relational data stores and multi-terabyte data warehouses
  • Excellent knowledge onHadoop architecture: Hadoop Distributed File system (HDFS), Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL
  • Hands on experience in various Big Data application phases likeData Ingestion, Data Analytics and Data Visualization
  • Experience in developing efficient solutions to analyze large data sets
  • Experience working on Hortonworks / Cloudera / MapR distributions
  • Extensively worked on MRV1 and MRV2 Hadoop architectures
  • Experience working on Spark, RDD’s, DAG’s, Spark SQL and Spark Streaming
  • Experience in importing and exporting data using Sqoop between HDFS and Relational Database Management Systems
  • Populated HDFS with huge amounts of data using Apache Kafka and Flume
  • Excellent knowledge of data mapping, extracting, transforming and loading from different data sources
  • Worked with different File Formats like TEXTFILE, SEQUENCE FILE, AVROFILE, ORC, and PARQUET for Hive querying and processing
  • Experience in developing custom MapReduce Programs in Java using Apache Hadoop for analyzing Big Data as per the requirement
  • Well experienced in data transformation using custom MapReduce, Hive and Pig scripts for different types of file formats
  • Expertise in extending Hive and Pig core functionality by writing custom UDFs and UDAF’s
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
  • Experience building solutions with NoSQL databases, such as HBase, Cassandra, MongoDB
  • Firm grip ondata modeling,data mapping, database performance tuningandNoSQLmap-reduce systems
  • In-depth understanding ofSpark architecture includingSparkCore,Spark SQL, Data Frames, and Spark Streaming
  • Hands on experience migrating complex MapReduce programs into Apache Spark RDD transformations
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala
  • Experience in Kafka installation & integration with Spark Streaming
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
  • Experience in designing both time driven and data driven automated workflows using Oozie
  • Good understanding of ZooKeeper for monitoring and managing Hadoop jobs
  • Monitoring Map Reduce Jobs and YARN Applications
  • Hands-on experiencewith Amazon Elastic MapReduce (EMR), Storage S3, EC2 instances and Data Warehousing
  • Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures
  • Used Git for source code and version control management
  • Proficient in Java, J2EE, JDBC, Collection Framework, Servlets, JSP, Spring, Hibernate, JSON, XML, REST, SOAP Web Services
  • Strong understanding in Agile and Waterfall SDLC methodologies

TECHNICAL SKILLS:

Big Data Technologies: HDFS, YARN, Map Reduce, Pig, Hive, HBase, Spark, Spark SQL, Spark Streaming, Sqoop, Flume, Kafka, ZooKeeper, Oozie

Big Data Distributions: Hortonworks, Cloudera, MapR, Amazon Elastic MapReduce (EMR)

Programming Languages: Java, Python, Scala, C++, R, JavaScript, Shell Script

Operating Systems: Linux, Windows, Unix

RDBMS: Oracle, MySQL, MS SQL Server

NoSQL Databases: HBase, Cassandra, MongoDB

Frame works: Spring, Hibernate, Struts

Web Servers: Apache Tomcat, Web Sphere, Web Logic

Version Control: Git, SVN, CVS

Integrated Development Environments (IDEs): Java Eclipse IDE, NetBeans, Microsoft SQL Studio

Web Technologies: HTML, CSS, Bootstrap, Java Script, DOM, XML, Servlets

PROFESSIONAL EXPERIENCE:

Confidential, IL

Sr. Hadoop Developer

Responsibilities:

  • Involved in complete project life cycle starting from design discussion to production deployment
  • Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
  • Data pipeline consists Spark, Hive and Sqoop and Custom Build Input Adapters to ingest, transform and analyse operational data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Real time streaming the data using Spark with Kafka
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple report suites.
  • Ingested syslog messages, parses them and streams the data to Apache Kafka.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behaviour.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Created HBase tables and column families to store the user event data.
  • Scheduled and executed workflows in Oozie to run various jobs.

Environment: Java, Scala, Hadoop, Hortonworks, AWS, HDFS, YARN, Map Reduce, Hive, Pig, Spark, Flume, Kafka, Sqoop, Oozie, Zookeeper, Oracle, Teradata, MySQL

Confidential, OH

Hadoop Developer

Responsibilities:

  • Worked on cloud platform which was built with a scalable distributed data solution using Hadoop on a 40-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
  • Worked on analyzingHadoopstack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Designing and implementing semi-structured data analytics platform leveragingHadoop.
  • Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
  • Installation and Configuration ofHadoopCluster. Working with Cloudera Support Team to Fine Tune Cluster.
  • Developed a custom File System plugin forHadoopso it can access files on Hitachi Data Platform.
  • Developed connectors for elastic search and green plum for data transfer from a kafka topic.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
  • Involved in Optimization of Hive Queries.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Involved in Data Ingestion to HDFS from various data sources.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between ApacheHadoopand relational databases.
  • Automated Sqoop, hive and pig jobs using Oozie scheduling.
  • Extensive knowledge in NoSQL databases like HBase
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
  • Have good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
  • Helped business team by installing and configuringHadoopecosystem components along with Hadoop admin.
  • Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
  • Worked on loading log data into HDFS through Flume
  • Created and maintained technical documentation for executing Hive queries and Pig Scripts.
  • Worked on debugging and performance tuning of Hive &Pig jobs.
  • Used Oozie to schedule various jobs onHadoop cluster.
  • Used Hive to analyses the partitioned and bucketed data.
  • Worked on establishing connectivity between Tableau and Hive.

Environment: Hortonworks 2.4,Hadoop, HDFS, Map Reduce, Mongo DB, Cloudera Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, AWS, Tableau, Sqoop, Flume, Linux, UNIX

Confidential, FL

Hadoop Developer

Responsibilities:

  • Worked on Hortonworks cluster, which is responsible for providing open source platform based on Apache Hadoopfor analyzing, storing and managing big data
  • Worked with analyst to determine and understand business requirements
  • Load and transform large datasets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
  • Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
  • Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
  • Involved in submitting and tracking MapReduce jobs using Job Tracker
  • Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
  • Written Hive UDF to sort Structure fields and return complex data types
  • Created Hive tables from JSON data using data serialization framework like AVRO
  • Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
  • Experience in working with NoSQL database HBase in getting real time data analytics
  • Integrated Hive tables to HBase to perform row level analytics
  • Developed Oozie workflows for daily incremental loads, which Sqoop’s data from Teradata, Netezza and then imported into Hive tables
  • Involved in performance tuning by using different service engines like TEZ etc.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadooplog files
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using AutoSys and Oozie coordinator jobs
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library

Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Tez, Hive, Pig, Oozie, Sqoop, Flume, Teradata, Netezza, Tableau

Confidential, CA

Hadoop Developer

Responsibilities:

  • Installed Cloudera distribution of Hadoop Cluster and services HDFS, Pig, Hive, Sqoop, Flume and MapReduce
  • Responsible for providing open source platform based on Apache Hadoopfor analyzing, storing and managing big data
  • Loaded and transformed large sets of structured, semi-structured and unstructured data
  • Responsible for managing data coming from different sources
  • Imported and exported data into HDFS and Hive using Sqoop
  • Wrote Hive queries
  • Involved in loading data from UNIX file system to HDFS
  • Created Hive tables, loaded with data and wrote queries which will run internally in MapReduce and performed data analysis as per the business requirements
  • Worked with analysts to determine and understand business requirements
  • Loaded and transformed large datasets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
  • Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
  • Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
  • Involved in submitting and tracking MapReduce jobs using Job Tracker
  • Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
  • Written Hive UDF to sort Structure fields and return complex data types
  • Created Hive tables from JSON data using data serialization framework like AVRO
  • Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
  • Experience in working with NoSQL database HBase in getting real time data analytics
  • Integrated Hive tables to HBase to perform row level analytics
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadooplog files
  • Developed Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
  • Supported operations team in Hadoopcluster maintenance including commissioning and decommissioning nodes and upgrades
  • Provided technical assistance to all development projects
  • Hands-on experience with Qlik Sense for Data Visualization and Analysis on large data sets, drawing various insights
  • Created dashboards using Qlik Sense and performed Data extracts, Data blending, Forecasting, and table calculations

Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Flume, Netezza, Qlik Sense

Confidential

Java Developer

Responsibilities:

  • Built the application based on Rational Unified Process (RUP)
  • Analyzed and developed UML’s with Rational Rose including development of class diagrams, sequence diagrams, use case diagrams and activity diagrams
  • Implemented the Middle-Tier employing design patterns like MVC, Business Delegate, Service Locator, Session Façade, Data Access Objects (DAO’s)
  • Developed using MVC architecture and employed the Struts Framework and used Validator Framework and Tiles Framework as a plug-in with struts
  • Developed user interface using JSP, JSP Tag libraries (JSTL) and Struts Tag Libraries
  • Used EJB’s in the application and developed Session beans to house business login at the middle tier level
  • Used Java Message Service (JMS) for reliable and asynchronous exchange of important information
  • Used Hibernate in data access layer to access and update the information in database
  • Implemented various XML technologies like XML schemas, JAXB parsers for cross platform data transfer
  • Used JSON to pass objects between web pages and server-side application
  • Used XSL-FO to generate PDF reports
  • Extensively worked on XML parsers (SAX/DOM)
  • Used WSDL and SOAP protocol for Web Services implementation
  • Used JDBC to access DB2 UDB database for accessing customer information
  • Developed application level logging using Log4J
  • Used CVS for version controlling and Junit for unit testing
  • Involved in development of Tables, Indices, Stored procedures, Database Triggers and Functions
  • Involved in documenting the application

Environment: J2EE 1.7, WebSphere Application Server v8.0, RAD, JSP 2.0, EJB 3.1, Struts 2.0, JMS, JSON, JDBC, JNDI, XML, XSL, XSLT, XSL-FO, WSDL, SOAP, Hibernate 4.0, RUP, Rational Rose (2000), Log4J, Junit, CVS, IBM DB2 v8.2, Red Hat LINUX, RESTful web services

We'd love your feedback!