Hadoop Developer Resume
0/5 (Submit Your Rating)
Nyc, NY
SUMMARY
- 5+ years of experience in working with Big Data Technologies on systems which comprises of massive amount of data running in highly distributed Hadoop environment.
- Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Flume, MapReduce and Yarn.
- Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming.
- Implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used spark and spark - shell accordingly.
- Experience in configuring Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
- Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
- Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Used Spark Data Frame Operations to perform required Validations in the data.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra.
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
- Experienced in designing different time driven and data driven automated workflows using Oozie.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
- Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and worked on various version control tools like GIT, SVN.
- Experienced in working in SDLC, Agile and Waterfall Methodologies.
- Excellent Communication skills, Interpersonal skills, problem solving skills and a team player. Ability to quickly adapt new environment and technologies.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
PROFESSIONAL EXPERIENCE
Confidential, NYC, NY
Hadoop Developer
Responsibilities:
- Worked on business problems to develop and articulate solutions usingTeradata
- Worked on analyzing different big data analytic tools includingHive, ImpalaandSqoopin importing data fromRDBMStoHDFS.
- Involved in extracting customer's Datafrom various data sources intoHadoop HDFS.
- Imported data from structured data source intoHDFSusingSqoopincremental imports.
- Created various Documents such as Source-To-Target Data mapping Document,Unit Test, Cases andData MigrationDocument.
- DevelopedSQL scriptsusingSparkfor handling different data sets and verifying the performance over Map Reduce jobs.
- UtilizedAgile ScrumMethodology to help manage and organize a team of 4 developers with regular code review sessions.
- Worked in tuningHiveto improve performance and solved performance issues in both scripts
- Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.
Environment: Hadoop, HDFS, MapReduce, Sqoop, Hive, HBASE, Oracle, Teradata, Scala, Spark, Linux.
Confidential, Beaverton, Oregon
Hadoop Developer
Responsibilities:
- Developed data pipelines using Stream sets Data Collector to store data from Kafka into HDFS, Solr, Elastic Search, HBase and MapR DB.
- Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
- Worked on business problems to develop and articulate solutions usingTeradata's
- Worked on analyzing different big data analytic tools includingHive, ImpalaandSqoopin importing data fromRDBMStoHDFS.
- ConfiguredSpark Streamingto receive real time data from the Kafka and store the stream data toHDFS.
- DevelopedMapReduce/Spark Pythonmodules for machine learning & predictive analytics inHadooponAWS.
- Designed high levelETL architecturefor overall data transfer from theOLTPtoOLAP.
- Improving the performance and optimization of existing algorithms in Hadoop using Spark context,Spark-SQLandSpark YARN.
- Involved in creatingData Lakeby extracting customer'sBig Datafrom various data sources intoHadoop HDFS.
- Created various Documents such as Source-To-Target Data mapping Document,Unit Test, Cases andData MigrationDocument.
- Imported data from structured data source intoHDFSusingSqoopincremental imports.
- Performed data synchronization betweenEC2andS3,Hivestand-up, andAWSprofiling.
- CreatedHive tables, partitions and implemented incremental imports to performad-hocqueries on structured data.
- Worked withNoSQLdatabases likeHBase, Cassandra, DynamoDB (AWS)andMongoDB.
- Involved in loading data fromUNIXtile system toHOPSusingFlumeandHDFS API.
- DevelopedSQL scriptsusingSparkfor handling different data sets and verifying the performance over Map Reduce jobs.
- Involved in convertingMap Reduceprograms into Spark transformations using Spark RDD's usingScalaandPython.
- Supported MapReduce Programs those are running on the cluster and also wrote MapReduce jobs usingJava API.
- UtilizedAgile ScrumMethodology to help manage and organize a team of 4 developers with regular code review sessions.
- Wrote complexSQLandPL/SQLqueries for stored procedures.
- UsedS3 Bucketto store the jar's, input datasets and usedDynamo DBto store the processed output from the input data set.
- CreatedMapReducerunning overHDFSfor data mining and analysis usingRand Loading & Storage data forMapReduceoperations.
- Worked onMongoDB,HBase(NoSQL) databases which differ from classic relational databases
- Involved in convertingHiveQLinto Spark transformations usingSpark RDDand throughScalaprogramming.
- IntegratedKafka-Sparkstreaming for high efficiency throughput and reliability
- Worked onSparkfor collecting and aggregating huge amount of log data and stored it onHDFSfor doing further analysis.
- Worked in tuningHiveto improve performance and solved performance issues in both scripts
- Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.
Environment: HDFS, Hadoop, Kafka, MapReduce, Spark, Impala, Hive, Avro, Parquet, Scala, JAVA.
Confidential, New York, New York
Hadoop Developer
Responsibilities:
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Worked on importing and exporting data from Oracle into HDFS using Sqoop for analysis, visualization and to generate reports.
- Performing full and incremental imports and created Sqoop jobs.
- Implemented multiple MapReduce jobs for data processing.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Control-M.
- Used Zookeeper to provide coordination services to the cluster.
- Involved in loading data from edge node to HDFS using shell scripting.
- Created internal table, Externals tables in Hive, and merged the data sets using Hive joins. Involved in integration of Hive and Hbase.
- Designed and Developed Hive managed/external tables using Struct, Maps and Arrays using various storage formats.
- Implemented various performance techniques (Partitioning, Bucketing) in hive to get better performance.
- Worked with different Hadoop file formats like Parquet and compression techniques like gzip & Snappy.
- Built real time pipeline for streaming data using Kafka and Spark Streaming.
- Implemented python script to perform transformations and loaded the data into Hive.
- Worked on the Spark SQL for analyzing the data.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Developed Spark code to read data from Hive, group the fields and generate XML files.
- Implemented Spark using Spark and SparkSQL for faster testing and processing of data.
- Involved in HDFS maintenance and loading of structured and unstructured data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.
