Hadoop Developer Resume NYC, NY - Hire IT People

SUMMARY

5+ years of experience in working with Big Data Technologies on systems which comprises of massive amount of data running in highly distributed Hadoop environment.
Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Flume, MapReduce and Yarn.
Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming.
Implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used spark and spark - shell accordingly.
Experience in configuring Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
Used Spark Data Frame Operations to perform required Validations in the data.
Experience in integrating Hive queries into Spark environment using Spark SQL.
Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra.
Worked on HBase to load and retrieve data for real time processing using Rest API.
Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
Experienced in designing different time driven and data driven automated workflows using Oozie.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and worked on various version control tools like GIT, SVN.
Experienced in working in SDLC, Agile and Waterfall Methodologies.
Excellent Communication skills, Interpersonal skills, problem solving skills and a team player. Ability to quickly adapt new environment and technologies.
Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.

PROFESSIONAL EXPERIENCE

Confidential, NYC, NY

Hadoop Developer

Responsibilities:

Worked on business problems to develop and articulate solutions usingTeradata
Worked on analyzing different big data analytic tools includingHive, ImpalaandSqoopin importing data fromRDBMStoHDFS.
Involved in extracting customer's Datafrom various data sources intoHadoop HDFS.
Imported data from structured data source intoHDFSusingSqoopincremental imports.
Created various Documents such as Source-To-Target Data mapping Document,Unit Test, Cases andData MigrationDocument.
DevelopedSQL scriptsusingSparkfor handling different data sets and verifying the performance over Map Reduce jobs.
UtilizedAgile ScrumMethodology to help manage and organize a team of 4 developers with regular code review sessions.
Worked in tuningHiveto improve performance and solved performance issues in both scripts
Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.

Environment: Hadoop, HDFS, MapReduce, Sqoop, Hive, HBASE, Oracle, Teradata, Scala, Spark, Linux.

Confidential, Beaverton, Oregon

Hadoop Developer

Responsibilities:

Developed data pipelines using Stream sets Data Collector to store data from Kafka into HDFS, Solr, Elastic Search, HBase and MapR DB.
Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
Worked on business problems to develop and articulate solutions usingTeradata's
Worked on analyzing different big data analytic tools includingHive, ImpalaandSqoopin importing data fromRDBMStoHDFS.
ConfiguredSpark Streamingto receive real time data from the Kafka and store the stream data toHDFS.
DevelopedMapReduce/Spark Pythonmodules for machine learning & predictive analytics inHadooponAWS.
Designed high levelETL architecturefor overall data transfer from theOLTPtoOLAP.
Improving the performance and optimization of existing algorithms in Hadoop using Spark context,Spark-SQLandSpark YARN.
Involved in creatingData Lakeby extracting customer'sBig Datafrom various data sources intoHadoop HDFS.
Created various Documents such as Source-To-Target Data mapping Document,Unit Test, Cases andData MigrationDocument.
Imported data from structured data source intoHDFSusingSqoopincremental imports.
Performed data synchronization betweenEC2andS3,Hivestand-up, andAWSprofiling.
CreatedHive tables, partitions and implemented incremental imports to performad-hocqueries on structured data.
Worked withNoSQLdatabases likeHBase, Cassandra, DynamoDB (AWS)andMongoDB.
Involved in loading data fromUNIXtile system toHOPSusingFlumeandHDFS API.
DevelopedSQL scriptsusingSparkfor handling different data sets and verifying the performance over Map Reduce jobs.
Involved in convertingMap Reduceprograms into Spark transformations using Spark RDD's usingScalaandPython.
Supported MapReduce Programs those are running on the cluster and also wrote MapReduce jobs usingJava API.
UtilizedAgile ScrumMethodology to help manage and organize a team of 4 developers with regular code review sessions.
Wrote complexSQLandPL/SQLqueries for stored procedures.
UsedS3 Bucketto store the jar's, input datasets and usedDynamo DBto store the processed output from the input data set.
CreatedMapReducerunning overHDFSfor data mining and analysis usingRand Loading & Storage data forMapReduceoperations.
Worked onMongoDB,HBase(NoSQL) databases which differ from classic relational databases
Involved in convertingHiveQLinto Spark transformations usingSpark RDDand throughScalaprogramming.
IntegratedKafka-Sparkstreaming for high efficiency throughput and reliability
Worked onSparkfor collecting and aggregating huge amount of log data and stored it onHDFSfor doing further analysis.
Worked in tuningHiveto improve performance and solved performance issues in both scripts
Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.

Environment: HDFS, Hadoop, Kafka, MapReduce, Spark, Impala, Hive, Avro, Parquet, Scala, JAVA.

Confidential, New York, New York

Hadoop Developer

Responsibilities:

Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
Worked on importing and exporting data from Oracle into HDFS using Sqoop for analysis, visualization and to generate reports.
Performing full and incremental imports and created Sqoop jobs.
Implemented multiple MapReduce jobs for data processing.
Managing and scheduling batch Jobs on a Hadoop Cluster using Control-M.
Used Zookeeper to provide coordination services to the cluster.
Involved in loading data from edge node to HDFS using shell scripting.
Created internal table, Externals tables in Hive, and merged the data sets using Hive joins. Involved in integration of Hive and Hbase.
Designed and Developed Hive managed/external tables using Struct, Maps and Arrays using various storage formats.
Implemented various performance techniques (Partitioning, Bucketing) in hive to get better performance.
Worked with different Hadoop file formats like Parquet and compression techniques like gzip & Snappy.
Built real time pipeline for streaming data using Kafka and Spark Streaming.
Implemented python script to perform transformations and loaded the data into Hive.
Worked on the Spark SQL for analyzing the data.
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Experienced in implementing Spark RDD transformations, actions to implement business analysis.
Developed Spark code to read data from Hive, group the fields and generate XML files.
Implemented Spark using Spark and SparkSQL for faster testing and processing of data.
Involved in HDFS maintenance and loading of structured and unstructured data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Nyc, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship