We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • 7+ years of overall IT experience with 5+ Years of experience in Big Data technologies which includes designing and implementing Map Reduce and Spark Architectures, 2+ years of AWS Cloud, 2+ years in JAVA Technologies and 1+ years in Elasticsearch (Elastic Cloud Enterprise)
  • Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase - Hive Integration, PIG, Sqoop, Flume, Oozie, Zookeeper & knowledge of Mapper/Reduce/HDFS Framework.
  • Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.
  • Experience in AWS services like EMR, EC2, S3, Cloud Formation stacks, Glue, Redshift, DynamoDB, Aurora RDS, Cloud watch, SNS, Lambda and Step Functions.
  • Experience in creating IAM Roles, Security Groups, Firewalls and Load Balancers.
  • Good exposure and experience in Spark, Scala, Big Data and AWS Stack.
  • Experience in managing and reviewing Hadoop log files.
  • Have experience in Shell Scripting and used it extensively with Spark for data processing.
  • Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
  • Experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase.
  • Analyzing Data through Hive QL, Pig Latin & MapReduce programs in Java.
  • Extending HIVE and PIG core functionalities by implementing custom UDF’s.
  • Performed ad-hoc queries on structured data using Hive QL and used Partition, bucketing techniques and joins with Hive for faster data access.
  • Installing and using Elasticsearch and Kibana, performing complex aggregations using curl commands to query data, Insert, Update and Delete records from/to Elasticsearch index.
  • Writing data to Elasticsearch index thru Spark using AWS EMR.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark and Scala.
  • Hands on experience on Hortonworks, Cloudera & MapR Hadoop environments.
  • Experience in writing Maven and SBT scripts to build and deploy Java and Scala Applications.
  • Good understanding of Hadoop administration with Hortonworks, Cloudera & MapR.
  • Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to DF and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Experience in creating Impala views on hive tables for fast access to data.
  • Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables & HBase.
  • Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
  • Involved in production monitoring using workflow monitor and experience in development and support environments.
  • Experienced in using waterfall, Agile and Scrum models of software development process framework.
  • Strong knowledge of version control systems like Bit Bucket and GITHUB.
  • Good level of experience in Core Java, JEE technologies, JDBC, Servlets and JSP.
  • Good knowledge in Oracle PL/SQL and shell scripting.
  • Experience in design and development of Web forms using Spring MVC, Java Script, JSON and JQ plotter.
  • Developing and maintaining the Web Applications using the Web Server Tomcat.
  • Experience in using IDEs like Eclipse and IntelliJ.
  • Possess Ability and Willingness for learning new technology as needed.
  • Excellent oral and written skills.
  • Good proficiency in Microsoft Excel, Word, Power presentations and visual studio.
  • Active team player with excellent interpersonal skills, keen learner with self-commitment & innovation.
  • Ability to meet deadlines and handle pressure in coordinating multiple tasks in the work environment.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, Map Reduce, Hive, YARN, Spark, Flume, Kafka, Oozie, Sqoop, Impala, Zookeeper, Hbase and Cassandra

Hadoop Distributions: EMR, Cloudera, Hortonworks and MapR.

Languages: Java, Scala, Python, LINUX Shell Scripting, SQL

AWS Services: EC2, EMR, S3, Lambda, Step Functions, Cloud Formation Stacks. Aurora RDS, IAM Roles, Cloud Watch, SNS, VPC, Security Groups and Load Balancers.

Elastic Cloud Enterprise: Elasticsearch, Kibana and Logstash.

Scripts: JavaScript, Shell Scripting

Database: Oracle 10g, MySQL, MSSQL

No SQL Database: HBase, Cassandra, MongoDB

Web Servers: Apache Tomcat

Operating Systems: Windows, Linux (Cent OS)

CI/CD Pipeline: Jenkins and Bamboo.

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • This project is part of PBM (Pharmacy Benefit Manager) and aims to provide analytics on the claims data processed in the firm over the time period. This will help customer to identify the total amount of claims they have made/processed and other insights of it.
  • Designing and developing Spark framework for ETL purpose using Spark modules like Spark core, Spark SQL, Spark datasets and data frames coded using Scala language.
  • Hands on coding with Scala for leveraging Apache Spark through the Scala APIs.
  • Developed various main & service classes through Scala using spark SQLs for the requirement specific tasks.
  • Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
  • Run ETL Jobs using Apache Spark on AWS EMR to extract data from multiple Cloud services like AWS S3, EMR and write the data to Elasticsearch Index’s after performing required transformations.
  • Integrating different AWS Services and to configure, install and run Elastic Search on AWS EC2, creating the necessary Security Groups, VPC’s, load balancer and firewalls for successful integration.
  • Spin up EC2 instances, create IAM Roles, EMR Clusters and write Lambda Functions, Step Functions and Cloud Formation templates that are necessary for running daily jobs.
  • Automate and deploy Cloud Formation stacks to model or build the entire infrastructure and application resources like Lambda, Step Functions, Security Groups, Cloud watch rules and SNS.
  • Write Python scripts to run Transient EMR clusters, Spark and Hive jobs using boto3 library and to perform DML operations on Aurora RDS tables using pymysql library.
  • Write Python scripts to automate daily job runs. Scripts are written in Python to check cluster status and send triggers, trigger Step Functions or other Lambda functions, check job status and send email alerts, run Hive scripts, Terminate clusters and more.
  • Deploy Lambda functions written in Python to leverage the server less compute service.
  • Create Step Functions based on the requirement to orchestrate complex flows using Lambda Functions.
  • Developing the Tasks and setting up the requirement environment for running Hadoop in cloud on various instances.
  • Use Snowflake for Data storage, Processing and Analytics to achieve faster, easier and more flexible tool that is not built on any big data platform as Hadoop but instead uses new SQL query engine with and innovative architecture natively designed for cloud.
  • Create Stages on top of AWS S3, write Functions, Procedures, Tasks and Monitors to perform daily activities and then finally Merge to Target table.
  • Complete knowledge of installing Standalone Elasticsearch on AWS EC2 and Elastic Cloud Enterprise (ECE) to deploy Elasticsearch (ES) and Kibana using docker images.
  • Write data to Elasticsearch index using Spark thru AWS EMR.
  • Working on Elasticsearch and Kibana which is an integrated logging and dashboard system which also required when monitoring and managing the health of application.
  • Create ES indexes, perform Curl operations to query, perform complex aggregations, and update/delete the records in Elasticsearch index and performing alias.
  • Using Logstash to build pipelines to ingest Logs from Elasticsearch to AWS S3.
  • Working on Agile based model using JIRA for complete planning, organizing and collaborating with the team for all the solutions.
  • Understanding the mapping documents and need to work closely with Business Analysts for enhancements and modifications.
  • Performing code builds using Maven and deploys the code to repository using Bit Bucket as a version control tool, which maintains all the code and data repositories.
  • Working on Bamboo to build the trigger and to check the progress of each build.

Environment: Spark 2.4.3, Scala 2.11.12, Java 1.8.0 242, EMR 5.28.0, YARN, ECE 2.4.0, Elasticsearch 7.6.1, Kibana, AWS EC2, S3, Aurora RDS, Hive, Eclipse and Bit Bucket.

Confidential, Hartford, CT

Hadoop Developer

Responsibilities:

  • Worked on MapR Distributed platform to store and analyze huge volumes of data (typically Big Data)
  • Working on Big data/Hadoop infrastructure for batch processing as well as real time processing data. Responsible for building scalable distributed data solutions using Hadoop with Spark.
  • Write Kafka Consumer jobs to consume data in real time basis, make required Transformations & Aggregations & store it to Hive & HBase (NoSQL)
  • Consumed TB’s of IOT device Data through Kafka consumer in real time basis in JSON format & stored to Hive and Hbase tables.
  • Stored data in Hbase tables with different rowKey combinations to store latest data per member per day as well as Historical data with Aggregations.
  • Populating HBase tables through automation. Used Spark Shell for querying database.
  • Work on developing various provision jobs using Spark-Scala to provision the data to downstream partners.
  • Worked on various optimization techniques in Spark - Scala to enhance the efficiency of jobs
  • Working with various file formats like JSON, CSV, Text and Parquet. Using Spark with Scala to read different file formats and loaded data into database.
  • Worked on provisioning the data to down streams in FHIRformat (Fast Healthcare Interoperability Resources). Has good understanding of different resource types available in FHIR & good knowledge on developing code for FHIR format.
  • Worked on setting up the infrastructure for Docker Containers & OpenShift.
  • Has working experience on various Container Orchestration tools like OpenShift & Kubernetes.
  • Worked on converting the developed code Jar file to Docker Images & deploy Docker Images to OpenShift environment to run them as Docker Containers.
  • Implemented continuous integration and continuous deployment (CICD) of code in production using Jenkins pipeline.
  • Build Tools used were Maven and SBT.
  • Implemented SPARK batch jobs.
  • Used Fortify Scan and SonarQube to improve the code coverage.
  • Worked with TWS workflow scheduler to build workflows with dependencies & to run jobs.

Environment: MapR Hadoop Distribution, Hive, Kafka, HBase, IntelliJ, Maven builds, Spark, Spark SQL, Oozie, Linux/Unix, Shell Scripting, GIT, OpenShift, Kubernetes, Jenkins Pipeline.

We'd love your feedback!