We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Blue Shield, MI

SUMMARY:

  • Sr. Hadoop Developer having 8 years of programming, technology, and engineering expertise in developing software while incorporating critical thinking, problem solving, and leadership.
  • Experienced professional with strong background in file distribution systems in a big - data.
  • Ability to understand the complex processing needs of big data and have experience developing codes and modules to address those needs.
  • Brings proficiency as Hadoop and Spark Developer , Moving Data into Hadoop, Hadoop Data Access & Data pipelines with Apache Kafka .
  • IT experience in all phases of Hadoop Development , Java Development along with experience in Application Development & Data modelling through various roles over the years.
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks, MAPR distribution. Good knowledge on Amazon Web Services (AWS) .
  • In depth experience in using various Big Data Ecosystem tools like MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, HBase , and Zookeeper .
  • Developed Spark Scala code to cleanse and perform ETL on the data in data pipeline in different stages.
  • Extensively worked on Spark streaming, Apache Kafka, Apache Flume and Apache Storm to fetch and transform live stream data.
  • Extensively worked on Spark components like SparkSQL, MLlib, GraphX , and Spark Streaming .
  • Experience in migrating map reduces programs into Spark RDD transformations , actions to improve performance.
  • Experience in importing and exporting the data using Sqoop from Relational Database to HDFS and reverse on Linux systems.
  • Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
  • Experienced in all phases of Software Development Life Cycle (SDLC) .
  • Efficient in analysing data using HiveQL, Pig Latin , partitioning an existing data set with static and dynamic partition, tune data for optimal query performance.
  • Good experience in transformation and storage: HDFS, MapReduce, Spark and writing Spark applications in Python (Pyspark), Scala and Java . Also implemented API services using Python in Spark .
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Created User Defined Functions (UDF’s) , User Defined Aggregated Functions (UDAF’s) in PIG and Hive.
  • Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster .
  • Implemented machine learning techniques like clustering and regression using SPARK API .
  • Experience in creating data-models for client's transactional logs, analysed the data from Cassandra tables for quick searching, sorting, and grouping using the Cassandra Query Language (CQL) .
  • Experience in loading the data from the different Data sources like ( Teradata and DB2 ) into HDFS using Sqoop and load into Hive tables , which are partitioned.
  • Extensively worked with Cloudera Distribution of Hadoop, CDH 5.x, and CDH4.x . Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
  • Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5) .
  • Developed web page interfaces using JSP, HTML5, and CSS3. Expertise working in Java/J2EE, JDBC, ODBC, Servlets , JSP, Spring, Hibernate.
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij .
  • Excelled in using version control tools like GIT . Used web-based UI development using HTML, HTML5, XHTML, CSS and JavaScript .
  • Developed stored procedures and queries using PL/SQL . Development experience in RDBMS like Confidential DB2, MS SQL Server, Teradata, and MYSQL .
  • Experience with best practices of Web services development and Integration (both REST and SOAP ).
  • Good scripting experience in Bash/Shell, and python programming.
  • Experience in using Git as Source Code Management tool. Created Branches and also merged branches.
  • Experience in using Jenkins for Continuous integration and Continuous deployment (CI/CD) and used Ant/ Maven as a build tool.
  • Also, used Chef for configuration management of various instances and Cloud watch for monitoring the instances. Have knowledge on Nagios.

TECHNICAL SKILLS:

Operating Systems: Windows, Linux distributions like Ubuntu, CentOS

Hadoop Distribution: Cloudera (CDH 3, CDH4, CDH5), Hortonworks

Programming Languages: Java, Python, Bash/Shell scripting, SQL, PL/SQL, Scala

Data stores: MySQL, SQL Server

Big data: MapReduce, HDFS, Flume, Hive, Pig, Oozie, HBase, Sqoop, Spark, NiFi and Kafka

AWS: AWS EMR, Amazon S3, Amazon EC2, Amazon VPC, Lambda, Amazon Route53, Amazon EBS, Elastic Load balancing, Redshift, DynamoDB, Amazon Cloud Watch, SNS, SES, SQS.

RDBMS: Teradata, Confidential 9i,10g,11i, MS SQL Server, MySQL and DB2

ETL: Talend and Informatica

Web Design Tools: HTML, CSS, JavaScript

Development/Build tools: Eclipse, Ant, Maven, IntelliJ

No SQL Database: Cassandra, MongoDB, HBase

Java Technologies: Servlets, JSP, JDBC, Junit

Web frameworks: Spring, Hibernate

Build Tools: Ant, Maven

CI/CD Tools: Jenkins

Configuration Management Tools: Chef

Configuration Monitoring Tools: Nagios

SCM Tools: Git

Web/Application servers: Apache, Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, Blue Shield, MI

Sr. Hadoop Developer

Responsibilities:

  • Built data pipelines for efficient and reliable data movement across systems, and also built the next generation of data tools to enable the company to take full advantage of this data.
  • Work and lead the team responsible for design and develop Real-time data flows and other data pipeline solutions.
  • Involved in migrating from on premise data centre to AWS .
  • Worked on Docker container snapshots, attaching to a running container, removing images, managing directory structures and managing containers in AWS ECS .
  • Built and shipped highly scalable clickstream data pipelines and analytics systems on distributed data systems ( Hadoop/AWS ).
  • Experience building batch, real-time and streaming analytics pipelines with data from event data streams, NoSQL and API s.
  • Worked on NoSQL database MongoDB in storing images and URIs .
  • Worked with MapReduce2 (YARN) setup. Created Spark Application to load data into Dynamic Partition Enabled Hive Table.
  • Worked on stateful transformation of Spark Application. Experience in implementing Spark RDD's in Scala .
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context , Spark-SQL , Spark MLlib , Data Frame , Pair RDD's , Spark YARN .
  • Hands-on experience in tools like Oozie and Airflow to orchestrate jobs.
  • Performed data analysis with MongoDB using Hive External tables. Exported the analysed data using Sqoop and to generate reports for the BI team.
  • Worked on transforming the queries written in Hive to Spark Application. Worked on Apache Nifi to decompress and move JSON files from local to HDFS.
  • Loaded data and extracted data from MySQL into HDFS & vice-versa using Sqoop .
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Cloud Compute (EC2 ) and Amazon Simple Storage Service (S3) .
  • Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups .
  • Designed, configured and managed public/private cloud infrastructures utilizing AWS .
  • Experienced in creating Amazon EC2 instances and setting up security groups and Configured Elastic Load balancers .
  • Worked on auto scaling the instances to design cost effective, fault tolerant and highly reliable systems.
  • Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
  • Scaled the instances as per the demand using Autoscaling and monitored the instances using CloudWatch.
  • Used Chef for configuration management of instances.
  • Adept at writing efficient Spark-Scala code to generate aggregation functions on Data Frames according to business logic.
  • Experienced in using DataStax Spark Connector which is used to store the data in Cassandra database from Spark .
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and also used Cassandra through Java services.
  • Handled importing of data from various data sources, performed transformations using Hive , MapReduce .

Environment: Apache Hive, Hadoop (HDFS) multi-node installation, AWS, UNIX Shell Scripting, Hadoop 2.7, Spark 1.4.1, Scala 2.10, SBT 0.13, Sqoop 1.4.6, MapReduce, HDFS, Pig, Hive 2.x.y, Java, Confidential 11g, DataStax, Cassandra 4.8, Centos, Windows, Python 2.7, MongoDB

Confidential, Tampa, Florida

Data Engineer

Responsibilities:

  • Responsible for developing and implementing some strategic enterprise data lake capabilities for collecting, storing, processing US Division data to support and enable analytics, digital transformation and operational improvements.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Managed Ubuntu Linux and Windows virtual servers on AWS EC2 Worked on Agile methodology and used JIRA for issue tracking.
  • Translate complex functional and technical requirements into detailed design and high performing capabilities
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Handling Hive queries using Spark SQL that integrate with Spark environment implemented in Scala & Python .
  • Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled structured data using Spark SQL.
  • Working experience with modern data streaming process with Kafka , Apache Spark , Flink , Hive , Pig , etc.
  • Used Spark for interactive queries, processing the stream data and integration with Cassandra for huge volume of data.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performed necessary transformations and aggregations to build the data model and persists the data in HDFS .
  • Workflow Management : Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig .
  • Participate in the design and build of data patterns and services - both batch, real-time and complex event handing - leveraging open technologies.
  • Develop data set processes for data modelling, and mining. Recommend ways to improve data reliability, efficiency and quality.
  • Involved in complete Big Data flow of the application starting from data ingestion upstream to HDFS , processing the data in HDFS and analysing the data and involved.
  • Low level design for MR , Hive , Impala , Shell scripts to process data. Developed PIG scripts for the analysis of semi structured data.
  • Designed and implemented Spark jobs to support distributed data processing. Developed ETL Process using Spark , Scala , Hive and HBase .
  • Worked on Kafka and Spark integration for real time data processing.
  • Bulk data processing and injection service from Hadoop to Cassandra and providing a thin REST layer on top for serving offline computed data online.
  • Analytics environment based on Docker and AWS , standardized the python dependencies. Wrote the core libraries that are shared by all data scientists.

Environment: Java, J2EE 1.7, Eclipse, Apache Hive, HDFS, Github, Jenkins, Anthill Pro, Windows, Docker, Python, Django, Scala, PIG, Cloudera, Hadoop, Scripting and AWS S3, EC2, Apache Hive, Impala, Shell Scripting, Apache Web Server, Spark, Spark SQL, JIRA.

Confidential, Dallas, TX

Big Data Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop . Written multiple MapReduce programs in Java for Data Analysis.
  • Wrote MapReduce job using Pig Latin and Java API . Responsible for architecting and administering the HDFS distributed file systems for over 12 petabytes of data.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context , Spark-SQL and Spark YARN using Scala .
  • Extensively used Zookeeper as job scheduler for Spark jobs. Moving bulk amount of data into HBase using Map Reduce Integration.
  • Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH5.3.x on Red hat Linux.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume .
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend .
  • Used Teradata to build Hadoop project and also as ETL project.
  • Load data from various data sources into HDFS using Kafka . Designed and presented plan for POC on impala .
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Worked on NiFi to automate the data movement between different Hadoop systems. Designed and implemented Spark jobs to support distributed data processing .
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Worked on MongoDB for distributed storage and processing.
  • Involved in collecting and aggregating large amounts of log data using Apache and staging data in HDFS for further analysis.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • ETL Data Cleansing , Integration & Transformation using Pig : Managing data from disparate sources.
  • Exported analysed data to the relational databases using Sqoop for visualization & Report generation.

Environment: HDFS, Scala, Python, CDH5, Hbase, NOSQL, RHEL 4/5/6, Hive, Pig, Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, MongoDB

Confidential

Hadoop Developer

Responsibilities:

  • Involved in implementation of Hadoop Cluster and Hive for Development and Test Environment.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context , Spark-SQL and Spark YARN using Scala . Analysed the data as per the business requirements using Hive queries
  • Imported the data from relational databases into HDFS using Sqoop . Performed administration, troubleshooting and maintenance of ETL and ELT processes.
  • Hands on experience in writing custom UDF's and also custom input and output formats.
  • Created Hive Tables, loaded values and generated adhoc-reports using the table data. Experience in commissioning and decommissioning nodes of Hadoop cluster.
  • Scheduled jobs using OOZIE workflow. Implemented APACHE IMPALA for data processing on top of HIVE .
  • Developed PIG and HIVE scripting for data processing on HDFS . Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
  • Improved the Hadoop cluster performance by considering the OS kernel, Disk I/O, Networking, memory, reducer buffer, mapper task, JVM task and HDFS by setting appropriate configuration parameters.
  • Integrated multiple sources of data ( SQL Server, DB2, MySQL ) into Hadoop cluster and analysed data by Hive-HBase integration.
  • Used Apache Solr to search data in HDFS Hadoop cluster. Designed and developed Data Ingestion component.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API .
  • Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.

Environment : Java, J2EE 1.7, 1.8, Eclipse, Puppet, HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, HBase, Hive, Flume, Sqoop, RHEL, MySQL, Apache Spark, SparkSQL, Linux, Apache Impala, Apache Sqoop

Confidential

Java Developer

Responsibilities:

  • Involved in Requirement gathering, Analysis and Design using UML and OOAD . Worked on Presentation layer used JSP , Servlets .
  • Coded SQL, PL/SQL for backend processing and retrieval logic. Involved in build and deploying the application using ANT builder .
  • Used Spring DAO concept to interact with Database using JDBC template and Hibernate template.
  • Used Git as version control tool and JIRA for project tracking.
  • Used Ant and Maven as build tools on Java projects for the development of build artifacts on the source code and Jenkins as CI/CD tool.
  • Responsible for Coding, Unit Testing and Functional Testing and Regression Testing.
  • Participated in technical discussion for architecture design, database and code enhancement.
  • Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle .
  • Active participation in architecture framework design and coding and test plan development.

Environment : Java JDK (1.5), Java J2EE, Servlets, JBoss application Server, Water Fall, JSPs, DB2, RAD, XML, Web Server, JUNIT, Spring, Hibernate, MS ACCESS, Microsoft Excel, XML, CSS, HTML, XPATH, JavaScript, Spring MVC

Confidential

Junior Java Developer

Responsibilities:

  • Extensive Involvement in Requirement Analysis and system implementation. Actively involved in SDLC phases like Analysis, Design and Development.
  • Responsible for developing modules and assist in deployment as per the client’s requirements.
  • Developed web components using JSP, Servlets, JDBC.
  • Made substantial contributions in simplifying the development and maintenance of ETL by creating re-usable Source , Target , Mapplets , and Transformation objects.
  • Experience in development of extracting, transforming and loading ( ETL ), maintain and support the enterprise data warehouse system and corresponding marts.
  • Skills gained on web-based REST API, SOAP API , and Apache for real-time data streaming.
  • Developed user interface using JSP, JavaScript and CSS Technologies.
  • Involved in Designing DB Schema for the application. Implemented Complex SQL Queries, Reusable Triggers, Functions, Stored procedures using PL/SQL .
  • Programmed OracleSQL, T-SQL Stored Procedures, Functions, Triggers and Packages as back-end processes to create and update staging tables, log and audit tables, and creating primary keys.
  • Extensively used Transformations like Aggregator, Router, Joiner, Expression, Lookup, Update Strategy, and Sequence Generator.
  • Involved in Tool development, Testing and Bug Fixing. Performed unit testing for various modules.

Environment: Java, J2EE, Servlets, JSP, SQL, PL/SQL, HTML, JavaScript, CSS, Eclipse, Confidential, MySQL, IBM WebSphere, JIRA, REST API, SOAP API, Apache, Oracle10/11g, SQL Loader, MS SQL Server, Aggregator, Router, Sequence Generator.

We'd love your feedback!