We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

AtlantA

SUMMARY:

  • Over 7+ years of professional IT experience Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
  • Involved in the Software Development Life Cycle (SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance
  • Extensive knowledge on data serialization techniques like Avro, Sequence Files, Parquet, JSON and ORC.
  • Hands on experience withHadoop, HDFS, Map Reduce andHadoop Ecosystem (Pig, Hive, Oozie, Flume and HBase).
  • Good experience transformation and storage: HDFS, Map Reduce, Spark.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL.
  • Strong understanding and strong knowledge in NoSQL databases like HBase, MongoDB and Cassandra.
  • Experience in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Experienced in using Integrated Development environments like Eclipse, NetBeans,
  • IntelliJ, Spring Tool Suite.
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and worked on various version control tools like GIT, SVN
  • Good knowledge in using Apache NiFi to automate the data movement between different Hadoop systems.
  • Experienced in designing different time driven and data driven automated workflows using Oozie.
  • Pleasant experience of partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Excellent Communication Skills, Interpersonal skills, problem solving skills and a team player. Ability to quickly adapt new environment and technologies.

TECHNICAL SKILLS:

Hadoop / Big Data: Big Data Ecosystems: HDFS, Map Reduce, Sqoop, Hive, Pig, Hbase, Spark, Flume, Kafka, Oozie, Nifi,Yarn

Java Technologies: Core Java, JSP, JDBC, EclipseProgramming languages: Java, Python, C, C++, Linux shell scripts, Scala

Databases: MySQL, Oracle, MS - SQL server, HBase, NoSQL, MongoDB, Cassandra, Teradata

Operating Systems: Linux, Unix, Windows 8, Windows 7, Windows 10,IOS

Other tools: Tableau, Cloudera 5.x, Hortonworks,Akka, GitLab, Skrewdriver, QueryGrid, Squirrel

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential

Responsibilities:

  • Implement shell scripts to load data from Teradata to hadoop by using spark submit and tdch connectors
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the MapReduce Jobs that extract the data in a timely manner. Responsible for loading data from UNIX file system to HDFS.
  • Created spark streaming jobs which collect data from source adobe data in hadoop location
  • Push streaming data from hadoop to Akka collector and to kafka topic and process data using scala scripts in spark.
  • Implemented Spark using Scala, PySpark and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in creating Hive managed / external tables while maintaining raw files integrity and analyzed data using hive queries.
  • Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling workflows.
  • Worked on CI/CD pipeline, integrating code changes to GitHub repository and build using Jenkins.
  • Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to HDFS. Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Experience in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive queries.
  • Built real time pipeline for streaming data using Kafka and SparkStreaming.

Environment: Hive, Java, Maven,Spark, Oozie, Teradata, Yarn, GitHub, Junit, Unix, Hortonworks, Kafka, Sqoop, HDFS, Scala, Spark, QueryGrid, Akka, TDCH

Hadoop Developer

Confidential, Atlanta

Responsibilities:

  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Build applications to get data from various sources and ingest that into Hadoop by using Sqoop, Kafka and shell scripts.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process which contains accelerator commands instead of OOZIE.
  • Implement python scripts which perform transformations and actions on tables and send incremental data to the next zone by using spark submit.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Working on improving the performance and optimization of the existing algorithms in Hadoop using SparkContext, Spark-SQL, Data Frame, Pair RDD's, SparkYARN.
  • Experienced in writing live Real-time Processing and core jobs usingSparkStreaming with Kafka as a data pipe-line system.
  • Tuning the spark jobs by adding more resources and executors to overcome out of memory issue when there is fast movement data with high volume
  • ConfiguredSparkstreaming to receive real time data from Kafka and store the stream data to HDFS for persistence and Hive for real time reporting.
  • Used Sqoop to import the data to Hadoop Distributed File System (HDFS) from RDBMS.
  • Provided NoSql solutions in HBase for data extraction and storing huge amount of data
  • Monitoring the jobs in control M and debugging the issues related to the failed jobs and finding the root cause and providing the resolution.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from ORACLE into HDFS using Sqoop
  • UsedSparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • ConfiguredSparkstreaming to receive real time data from Kafka and store the stream data to HDFS for persistence and Hive for real time reporting.

Environment: Hive, HBase, Java, Maven,Spark, Oozie, Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Kafka, Sqoop, HDFS, Scala, Python, Control M and ServiceNow.

 Hadoop Developer

Confidential, Texas

Responsibilities:

  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experienced in loading and transforming of large sets of structured, semi-structured and unstructured data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Experienced in developing Spark scripts for data analysis in both python and scala.
  • Wrote Scala scripts to make spark streaming work with Kafka as part of spark Kafka integration efforts.
  • Built on-premise data pipelines using Kafka and spark for real-time data analysis.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Developed a different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exporting of a result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
  • Experience in managing and reviewing Hadoop Log files.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
  • Setup SparkEMR to process huge data which is stored in AmazonS3.
  • Used Amazon CLI for data transfers to and from Amazon S3 buckets.
  • Executed Hadoop/Spark jobs on AWS EMR using programs and data is stored in S3 Buckets.
  • Build and configured Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
  • Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Used Sqoop to transfer data between relational databases and Hadoop
  • Worked on HDFS to store and access huge datasets within Hadoop
  • Good hands on experience with git and GitHub

Environment: Cloudera 5.8, Hadoop 2.7.2, HDFS2.7.2, AWS, PIG0.16.0, Hive2.0, Impala, Drill1.9, SparkSql1.6.1, MapReduce1.x, Flume 1.7.0, Sqoop1.4.6, Oozie 4.1, Storm1.0, Docker1.12.1, Kafka 0.10, Spark1.6.3, Scala 2.12, Hbase0.98.19, ZooKeeper3.4.9, MySQL, Tableau, Shell Scripting, Java.

We'd love your feedback!