We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Nyc, NY

SUMMARY

  • 5+ years of experience in working with Big Data Technologies on systems which comprises of massive amount of data running in highly distributed Hadoop environment.
  • Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Flume, MapReduce and Yarn.
  • Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming.
  • Implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used spark and spark - shell accordingly.
  • Experience in configuring Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
  • Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
  • Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Used Spark Data Frame Operations to perform required Validations in the data.
  • Experience in integrating Hive queries into Spark environment using Spark SQL.
  • Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra.
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
  • Experienced in designing different time driven and data driven automated workflows using Oozie.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and worked on various version control tools like GIT, SVN.
  • Experienced in working in SDLC, Agile and Waterfall Methodologies.
  • Excellent Communication skills, Interpersonal skills, problem solving skills and a team player. Ability to quickly adapt new environment and technologies.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.

PROFESSIONAL EXPERIENCE

Confidential, NYC, NY

Hadoop Developer

Responsibilities:

  • Worked on business problems to develop and articulate solutions usingTeradata
  • Worked on analyzing different big data analytic tools includingHive, ImpalaandSqoopin importing data fromRDBMStoHDFS.
  • Involved in extracting customer's Datafrom various data sources intoHadoop HDFS.
  • Imported data from structured data source intoHDFSusingSqoopincremental imports.
  • Created various Documents such as Source-To-Target Data mapping Document,Unit Test, Cases andData MigrationDocument.
  • DevelopedSQL scriptsusingSparkfor handling different data sets and verifying the performance over Map Reduce jobs.
  • UtilizedAgile ScrumMethodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Worked in tuningHiveto improve performance and solved performance issues in both scripts
  • Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.

Environment: Hadoop, HDFS, MapReduce, Sqoop, Hive, HBASE, Oracle, Teradata, Scala, Spark, Linux.

Confidential, Beaverton, Oregon

Hadoop Developer

Responsibilities:

  • Developed data pipelines using Stream sets Data Collector to store data from Kafka into HDFS, Solr, Elastic Search, HBase and MapR DB.
  • Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
  • Worked on business problems to develop and articulate solutions usingTeradata's
  • Worked on analyzing different big data analytic tools includingHive, ImpalaandSqoopin importing data fromRDBMStoHDFS.
  • ConfiguredSpark Streamingto receive real time data from the Kafka and store the stream data toHDFS.
  • DevelopedMapReduce/Spark Pythonmodules for machine learning & predictive analytics inHadooponAWS.
  • Designed high levelETL architecturefor overall data transfer from theOLTPtoOLAP.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context,Spark-SQLandSpark YARN.
  • Involved in creatingData Lakeby extracting customer'sBig Datafrom various data sources intoHadoop HDFS.
  • Created various Documents such as Source-To-Target Data mapping Document,Unit Test, Cases andData MigrationDocument.
  • Imported data from structured data source intoHDFSusingSqoopincremental imports.
  • Performed data synchronization betweenEC2andS3,Hivestand-up, andAWSprofiling.
  • CreatedHive tables, partitions and implemented incremental imports to performad-hocqueries on structured data.
  • Worked withNoSQLdatabases likeHBase, Cassandra, DynamoDB (AWS)andMongoDB.
  • Involved in loading data fromUNIXtile system toHOPSusingFlumeandHDFS API.
  • DevelopedSQL scriptsusingSparkfor handling different data sets and verifying the performance over Map Reduce jobs.
  • Involved in convertingMap Reduceprograms into Spark transformations using Spark RDD's usingScalaandPython.
  • Supported MapReduce Programs those are running on the cluster and also wrote MapReduce jobs usingJava API.
  • UtilizedAgile ScrumMethodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Wrote complexSQLandPL/SQLqueries for stored procedures.
  • UsedS3 Bucketto store the jar's, input datasets and usedDynamo DBto store the processed output from the input data set.
  • CreatedMapReducerunning overHDFSfor data mining and analysis usingRand Loading & Storage data forMapReduceoperations.
  • Worked onMongoDB,HBase(NoSQL) databases which differ from classic relational databases
  • Involved in convertingHiveQLinto Spark transformations usingSpark RDDand throughScalaprogramming.
  • IntegratedKafka-Sparkstreaming for high efficiency throughput and reliability
  • Worked onSparkfor collecting and aggregating huge amount of log data and stored it onHDFSfor doing further analysis.
  • Worked in tuningHiveto improve performance and solved performance issues in both scripts
  • Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.

Environment: HDFS, Hadoop, Kafka, MapReduce, Spark, Impala, Hive, Avro, Parquet, Scala, JAVA.

Confidential, New York, New York

Hadoop Developer

Responsibilities:

  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Worked on importing and exporting data from Oracle into HDFS using Sqoop for analysis, visualization and to generate reports.
  • Performing full and incremental imports and created Sqoop jobs.
  • Implemented multiple MapReduce jobs for data processing.
  • Managing and scheduling batch Jobs on a Hadoop Cluster using Control-M.
  • Used Zookeeper to provide coordination services to the cluster.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created internal table, Externals tables in Hive, and merged the data sets using Hive joins. Involved in integration of Hive and Hbase.
  • Designed and Developed Hive managed/external tables using Struct, Maps and Arrays using various storage formats.
  • Implemented various performance techniques (Partitioning, Bucketing) in hive to get better performance.
  • Worked with different Hadoop file formats like Parquet and compression techniques like gzip & Snappy.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Implemented python script to perform transformations and loaded the data into Hive.
  • Worked on the Spark SQL for analyzing the data.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Developed Spark code to read data from Hive, group the fields and generate XML files.
  • Implemented Spark using Spark and SparkSQL for faster testing and processing of data.
  • Involved in HDFS maintenance and loading of structured and unstructured data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

We'd love your feedback!