We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Maryland Heights, MO

PROFESSIONAL SUMMARY:

  • Over 7 years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
  • More than 4+years of work experience in ingestion, storage, querying, processing and analysis of Bigdata with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, Zookeeper, Sqoop, Flume, Oozie and AWS.
  • Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
  • Expertise in developing responsive Front - End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, jQuery and AngularJS.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapReduce and Apache distributions.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
  • Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, HadoopGEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Good understanding of R Programming, Data Mining and Machine Learning techniques.
  • Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
  • Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Experience in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement.
  • Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
  • Responsible for performing reads and writes in Cassandra from and web application by using JavaJDBC connectivity.
  • Experience in extending HIVE and PIG core functionality by using custom UDF's and UDAF's.
  • Debugging MapReduce jobs using Counters and MRUNIT testing.
  • Expertise in writing the Real-time processing application Using spout and bolt in Storm.
  • Experience in configuring various topologies in storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop.
  • Extensive experience working with Spark tools like RDD transformations, Spark MLlib and SPARQL.
  • Experienced in moving data from different sources using Kafka producers, consumers and preprocess data using Storm topologies.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
  • Good understanding of MPP databases such as HPVertica, Greenplum and Impala.
  • Hands on experience in implementing Sequence files, Combiners, Counters, Dynamic Partitions and Bucketing for best practice and performance improvement.
  • Highly Knowledgeable in streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources.
  • Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
  • Experience with Testing MapReduce programs using MRUnit, Junit.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
  • Extensive experience in working with SOA based architectures using Rest based web services using JAX-RS and SOAP based web services using JAX-WS.
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
  • Experience in different application servers like JBoss/Tomcat, WebLogic, IBM WebSphere.
  • Experience in working with Onsite-Offshore model.
  • Implemented logging framework - ELK stack (Elastic Search, Logstash & Kibana) on AWS.

TECHNICAL SKILLS:

Programming Languages: Java, python, Scala, Shell Scripting, SQL, PL/SQL.

J2EE Technologies: Java, Spring, Servlets, SOAP/REST services, JSP, JDBC, SML, Hibernate.

Bigdata Ecosystem: HBase, Hortonworks, MapReduce, Hive, Pig, Sqoop, Impala, CassandraOozie, Zookeeper, Flume, Ambary, Storm, Spark and Kafka.

Databases: NoSQL, Oracle 10g/11g/12C, SQL Server 2008/2008 R 2/2012/2014/2016/

2017, MySQL 2003: 2016.

Database Tools: Oracle SQL Developer, MongoDB, TOAD and PLSQL Developer.

Modeling Tools: UML on Rational Rose 4.0/7.5/7.6/8.1

Web Technologies: HTML5, JavaScript, XML, JSON, jQuery, Ajax, CSS3.

Web Services: Web Logic, Web Sphere, Apache Cassandra, Tomcat Eclipse, NetBeansWinSCP.

Operating systems: Windows, UNIX, Linux (Ubuntu), Solaris, Centos, Ubuntu, Windows Server 2003/2006/2008/2009/2012/2013/2016.

Frameworks: MVC, Struts, Log4J, Junit, Maven, ANT, Webservices.

PROFESSIONAL EXPERIENCE:

Confidential, Maryland Heights, MO

Senior Hadoop Developer

Responsibilities:

  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Analyzed the SQL scripts and designed the solution to implement using pyspark.
  • Designed and implemented MapReduce based large-scale parallel relation-learning system.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
  • Installed and Configured multi-nodes fully distributed Hadoop cluster.
  • Developed and delivered quality services on-time and on-budget. Solutions developed by the team use Java, XML, HTTP, SOAP, Hadoop, Pig and other web technologies.
  • Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
  • Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake.
  • Involved in scripting (python and shell) to provision and spin up virtualized Hadoop clusters
  • Worked with NoSQL databases like Base to create tables and store the data Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
  • Wrote Pig scripts to store the data into HBase
  • Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL
  • Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
  • Involved in Installing Hadoop Ecosystem components.
  • Responsible to manage data coming from different sources.
  • Setup Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning and performance tuning.
  • Written Complex MapReduce programs.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in HDFS maintenance and administering it through Hadoop-Java API
  • Configured Fair Scheduler to provide service level agreements for multiple users of a cluster
  • Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop.
  • Sound knowledge in programming Spark using Scala.
  • Involved in writing Java API’s for interacting with HBase
  • Involved in writing Flume and Hive scripts to extract, transform and load data into Database
  • Used HBase as the data storage
  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
  • Create interface to convert mainframe data into ASCII.
  • Experienced in installing, configuring and using Hadoop Ecosystem components.
  • Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
  • Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Got good experience with NOSQL database such as HBase.
  • Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Installed and configured Hive and also written Hive UDFs and Used Map Reduce and Junit for unit testing.
  • Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to HIVE and IMPALA.
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined detain partitioned tables in the EDW.
  • Monitored and managed the Hadoop cluster using Apache Ambary
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Red hat.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.

Environment: Java, Hadoop, Hive, Pig, Sqoop, Flume, HBase, Hortonworks, Oracle 10g/11g/12C, Teradata, Cassandra, HDFS, Data Lake, Spark, MapReduce, Ambari, Cloudera, Tableau, Snappy, Zookeeper, NoSQL, Shell Scripting, Ubuntu, Solar.

Confidential, Dublin, OH

Hadoop Developer/Administrator

Responsibilities:

  • Designed, developed, debug, tested and promoted Java/ETL code into various environments from DEV through to PROD .
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP . Involved in Unit testing and delivered Unit test plans and results documents.
  • Collected and aggregated large amount of web log data from various sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Installed Hadoop, Map Reduce , HDFS , and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Installed, monitored and maintained hardware/software related issues on Linux/UNIX systems .
  • Investigated, installed and configured software fail-over system for production Linux servers.
  • Administrator for Pig, Hive and HBase installing updates, patches and upgrades.
  • Extensively involved in Design phase and delivered Design documents.
  • Extensively involved in writing ETL Specifications for Development and conversion projects.
  • Worked on Oozie workflow engine for job scheduling.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Support all teams that are engaging with implementing new customer, including vendors who are supporting to establish new product.
  • Experienced in managing and reviewing the Hadoop log files.
  • Responsible for importing log files from various sources into HDFS using Flume .
  • Monitored and managed Hadoop cluster using the Cloudera Manager web- interface.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Defined and created data model, tables, views, queries etc. to support business requirements.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Populated HDFS and Cassandra with vast amounts of data using Apache Kafka .
  • Implemented six nodes CDH4 Hadoop Cluster on CentOS .

Environment: HDFS, Map Reduce, Flume, Hive, Informatica 9.1/8.1/7.1/6.1, Oracle 11g, SQOOP, Oozie, Pig, ETL, Hadoop 2.x, NOSQL, Talend, Flat files, AWS (Amazon Web services), Hortonworks, Shell Scripting.

Confidential, Manhattan, New York

Hadoop Developer

Responsibilities:

  • Actively involved in installation, configuration, design, developments and maintenances Hadoop cluster with several tools set with complete software development life cycle as an agile methodology.
  • Working latest version of Hadoop distribution system such as Hortonworks Distribution (HDP2.X).
  • Working on both kind of data processing as batch and streaming with ingestion to NoSQL and HDFS with different file format such as parquet and AVRO.
  • Working on integration of Kafka with Spark streaming for high speed data processing.
  • Developed multiple Kafka Producers and Consumers as per the business requirement also customized the partition to get optimized results.
  • Working on data pipelines as per the business requirements and scheduled it using Oozie schedulers.
  • Working on advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala and Python as per requirements.
  • Working on cluster co-ordination with data capacity planning and node forecasting using Zookeepers .
  • Working on experimental Spark API for better optimization of existing algorithms such as Spark context, Spark SQL, Spark Streaming, Spark Data Frames.
  • Involved on configuration, development of Hadoop environment with AWS cloud such as EC2, EMR, Redshift, Route 53, Cloud watch.
  • Experience on machine learning for training the data models using supervision algorithms of classifications.
  • Worked on Spark and MLib to develop a linear regression model for logistic information.
  • Worked on Exporting and analyzing data to the RDBMS using for visualization and to generate reports for the BI team.
  • Supported in setting up QA environment and updating configurations for implementing scripts

Environment: Scala, Spark SQL, Spark Streaming, Spark Data Frame, Spark MLib, HDFS, Hive, Sqoop, Kafka, Shell Scripting, Cassandra, Python, AWS, Tableau, SQL Server, GitHub, Maven

Confidential

Big Data Engineer/Developer

Responsibilities:

  • Developed several advanced Map Reduce programs to process data files received.
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Firm knowledge on various summarization patterns to calculate aggregate statistical values over dataset.
  • Experience in implementing joins in the analysis of dataset to discover interesting relationships.
  • Completely involved in the requirement analysis phase.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
  • Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Developed Pig Scripts and Pig UDFs to load data files into Hadoop.
  • Analyzed the data by performing Hive queries and running Pig scripts.
  • Developed PIG Latin scripts for the analysis of semi structured data and unstructured data.
  • Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering.
  • Experience in writing cron jobs to run at regular intervals.
  • Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
  • Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Experience in managing and reviewing Hadoop log files.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Hadoop 1.1.1, Java, Apache Pig 0.10.0, Apache Hive 0.10.0, MapReduce, HDFS, Flume 1.4.0, GIT, UNIX Shell scripting, PostgreSQL, Linux.

Confidential

Java Developer

Responsibilities:

  • Design of Java Servlets and Objects using J2EE standards.
  • Designed use cases, activities, states, objects and components.
  • Developed the UI pages using HTML, DHTML, Java script, Ajax, jQuery, JSP and tag libraries.
  • Developed front-end screens using JSP and Tag Libraries.
  • Performing validations between various users.
  • Coded HTML, JSP and Servlets.
  • Developed internal application using Angular and Node.js connecting to Oracle on the backend.
  • Coding xml validation and file segmentation classes for splitting large XML file into smaller segments using SAX Parser.
  • Created new connections through application coding for better access to DB2database and involved in writing SQL & PL SQL - Stored procedures, functions, sequences, triggers, cursors, object types etc.
  • Involved in testing and deploying in the development server.
  • Wrote oracle stored procedures (PL/SQL) and calling it using JDBC.
  • Involved in the design tables of the database in Oracle. Involved in the design tables of the database in Oracle.

Environment: Java1.7 J2EE, Apache Tomcat, CVS, JSP, Servlets, Struts, PL/SQL and Oracle.

We'd love your feedback!