We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Over 8+ years of extensive Professional IT experience, including 5+ years of Hadoop experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Well experienced in the Hadoop ecosystem components like Hadoop, MapReduce, Cloudera, Horton works, Mahout, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
  • Experience in using Automation tools like Chef for installing, configuring and maintaining Hadoop clusters.
  • Lead innovation by exploring, investigating, recommending, benchmarking and implementing data centric technologies for the platform.
  • Technical leadership role responsible for developing and maintaining data warehouse and Big Data roadmap ensuring Data Architecture aligns to business centric road map and analytics capabilities.
  • Experienced in Hadoop Architect and Technical Lead role, provide design solutions and Hadoop architectural direction
  • 4+ years of industrial experience in Data manipulation, Big Data analytics using Hadoop Eco system tools Map-Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, AWS, Cassandra, Avro, Solr and Zookeeper.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
  • Hands on experience in developing SPARK applications using Spark API's like Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Strong experience and knowledge of real time data analytics using Spark, Kafka and Flume.
  • Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).
  • Expertise in developing Pig Latin scripts and using Hive Query Language.
  • Developed Customized UDFs and UDAF's in java to extend HIVE and Pig core functionality.
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL.
  • Worked on GUI Based Hive Interaction tools like Hue, Karma sphere for querying the data.
  • Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
  • Done Clustering, regression and Classification using Machine learning libraries Mahout, MLlib (Spark).
  • Good experience with use-case development, with Software methodologies like Agile and Waterfall.
  • Working knowledge in installing and maintaining Cassandra by configuring the Cassandra. yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
  • Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data warehouses.
  • Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE.
  • Good knowledge on build tools like Maven, Griddle and Ant.
  • Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache and Amazon EMR Hadoop distributions.
  • Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews.
  • Hands-on knowledge in core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
  • Experience in Software Design, Development and Implementation of Client/Server Web based Applications using JSTL, jQuery, JavaScript, Java Beans, JDBC, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, AJAX and had a bird's eye view on React Java Script Library.

TECHNICAL SKILLS

  • APACHE(7 years)
  • APACHE HADOOP HDFS(6 years)
  • APACHE HADOOP SQOOP(6 years)
  • APACHE HBASE(6 years)
  • Hadoop(6 years)
  • Apache Hadoop 2.0.0
  • Pig 0.11
  • Hive 0.10
  • Sqoop 1.4.3
  • Flume
  • MapReduce
  • JSP
  • Structs2.0
  • NoSQL
  • HDFS
  • Teradata
  • Sqoop
  • LINUX
  • Oozie
  • Cassandra
  • Hue
  • HCatalog
  • Java. IBM Cognos
  • Oracle 11g/10g
  • Microsoft SQL Server
  • Microsoft SSIS
  • DB2 LUW
  • TOAD for DB2
  • IBM Data Studio
  • AIX 6.1
  • UNIX Scripting(8 years)

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential - Charlotte, NC

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Understanding business needs, analyzing functional specifications and map those to develop.
  • Involved in loading data from Mainframe DB2 into HDFS using Sqoop.
  • Handled Delta processing or incremental updates using Hive.
  • Responsible for daily ingestion of data from DATALAKE to CDB Hadoop tenant system.
  • Developed PIG Latin scripts in transformations while extracting data from source system.
  • To work on data issue related tickets and to provide the fix.
  • To monitor and fix the production job failures.
  • Review the team members design documents and coding.
  • Documented the systems processes and procedures for future references including design and code reviews.
  • Involved in story - driven agile development methodology and actively participated in daily scrum meetings.
  • Implemented data ingestion from multiple sources like IBM Mainframes, Oracle.
  • Developed transformations and aggregated the data for large data sets using Pig and Hive scripts.
  • Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
  • Have a thorough knowledge of spark architecture and how RDD's work internally.
  • Have exposure to Spark SQL.
  • Have experience in Scala programming language and used it extensively with Spark for data processing.

Environment: HDFS, Hive, Pig, HBase, Unix Shell Script, Talend, Spark, Scala.

Hadoop Developer

Confidential - Chadds Ford, PA

Responsibilities:

  • Worked on Hadoop cluster which ranged from 4 - 8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
  • Built APIs that will allow customer service representatives to access the data and answer queries.
  • Designed changes to transform current Hadoop jobs to HBase.
  • Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Extending the functionality of Hive and Pig with custom UDF s and UDAF's.
  • Developed Spark Application by using Scala.
  • Implemented Bucketing and Partitioning using Hive to assist the users with data analysis.
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.
  • Develop database management systems for easy access, storage, and retrieval of data.
  • Perform DB activities such as indexing, performance tuning, and backup and restore.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
  • Expert in creating PIG and Hive UDFs using Java to analyze the data efficiently.
  • Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
  • Implemented AJAX, JSON, and Java script to create interactive web screens.
  • Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.
  • Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
  • Support of applications running on Linux machines
  • Developed data formatted web applications and deploy the script using HTML5, XHTML, CSS, and Client- side scripting using JavaScript.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications.
  • Used Zookeeper to manage coordination among the clusters.
  • Experienced in analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suits the current requirements.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
  • Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required
  • Assisted in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.

Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, JSP, Structs2.0, NoSQL, HDFS, Teradata, Sqoop, LINUX, Oozie, Cassandra, Hue, HCatalog, Java. IBM Cognos, Oracle 11g/10g, Microsoft SQL Server, Microsoft SSIS, DB2 LUW, TOAD for DB2, IBM Data Studio, AIX 6.1, UNIX Scripting

Hadoop Developer

Confidential - Boston, MA

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple maps reduce jobs in PIG and Hive for data cleaning and pre - processing.
  • Coordinated with business customers to gather business requirements.
  • And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Extensively involved in the Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in map reduce way.
  • Experienced in defining job flows.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Load and Transform large sets of structured data.
  • Responsible to manage data coming from different sources.
  • Involved in creating Hive Tables, loading data and writing Hive queries.
  • Utilized Apache Hadoop environment by Cloudera.
  • The created Data model for Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results in documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.

Environment: HTML5, SCSS, CSS3, Mix Panel, Mustache, Glyph icons, Bootstrap, AngularJS, Spring AOP, Hibernate, Promises, Bower, NPM, React.js, Redux, NET, AWS, RESTful, Nodejs

Hadoop Developer

Confidential - St. Louis, MO

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, Sqoop & Spark.
  • Developed Spark code using Scala for faster processing of data.
  • AGILE development methodology has been followed to develop the application.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Highly skilled in integrating Kafka with Spark streaming for high-speed data processing
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively.
  • Integrated Apache Storm with Kafka to perform web analytics.
  • Uploaded click stream data from Kafka to Hdfs, HBase, and Hive by integrating with Storm
  • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into the target database.
  • Created, altered and deleted topics (Kafka Queues) when required with varying
  • Performance tuning using Partitioning, bucketing of IMPALA tables.
  • Experience in NoSQL database such as HBase, MongoDB Involved in cluster maintenance and monitoring.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Involved in loading data from UNIX file system to HDFS.
  • Created an email notification service upon completion of a job or the particular team which requested the data.
  • Worked on NOSQL databases which differ from classic relational databases.
  • Conducted requirements gathering sessions with various stakeholders
  • Involved in knowledge transition activities to the team members.
  • Successful in creating and implementing complex code changes.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring EC2 instances in VPC network & managing security through IAM and Monitoring server's health through Cloud Watch
  • Experience in S3, Cloud Front and Route 53.

Environment: Hadoopv2/Yarn-2.4, Spark, AWS, MapReduce, Teradata 15.0, Hive, REST, Sqoop, Flume, Pig, Cloudera, Kafka, SSRS.

Java Developer

Confidential

Responsibilities:

  • Involved in requirements collection & analysis from the business team.
  • Created the design documents with use case diagram, class diagram, and the sequence diagrams using rational rose.
  • Implemented the MVC architecture using Apache Struts framework.
  • Implemented Action Classes and server - side validations for account activity, payment history, and transactions.
  • Implemented views using struts tags, JSTL and Expression Language.
  • Implemented session beans to handle the business logic for fund transfer, loan, credit card & fixed deposit modules.
  • Worked with various Java patterns such as singleton and factory pattern at the business layer for effective objective behavior.
  • Worked on the Java collections API for handling the data objects between the business layers and the front end.
  • Developed unit test cases using JUnit.
  • Developed ant scripts and developed builds using Apache ANT.
  • The used clear case for source code maintenance.

Environment: J2EE1.4, Java, Tiles, JSP1.2, Java Mail, Clear Case, ANT, JavaScript, JMS

Java Developer

Confidential

Responsibilities:

  • Involved in design and implementation of server - side programming.
  • Involved in gathering requirements, analyzed them and prepared high-level documents.
  • Participated in all client meetings to understand the requirements.
  • Actively involved in designing and data modelling using Rational Rose Tool (UML)
  • Involved in the design of the SPACE database.
  • Designed and development of User Interfaces, Menus using HTML, JSP, JSP Custom Tag, JavaScript.
  • Implemented User Interface using spring tiles framework.
  • Developed, Deployed and tested JSP's, Servlets in Web logic.
  • Used Eclipse as IDE tool and integrated Web Logic with Eclipse to deploy and develop the applications and JDBC to connect the database.

Environment: Struts Framework, Java 1.3, XML, Data Modelling, JDBC, SQL, Pl/SQL, JMS, Web Services, SOAP, Solaris 9, ANT tool, Toad, Eclipse.

We'd love your feedback!