Hadoop And Spark Developer Resume
Framingham, MA
PROFESSIONAL SUMMARY:
- Hadoop/Spark developer 4+ years of experience in Big data application development through frameworks Hadoop, Spark
- Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
- Having Experience on Hadoop Eco System like HDFS (Hadoop File Distribution System), Map Reduce, Hive, Sqoop, oozie, pig.
- Hands on experience in Distribution - Cloudera, Hortonworks.
- Experience in data cleansing using Spark Map and Filter Functions.
- Experience in creating Hive Tables and loading the data from different file formats.
- Implemented Partitioning and Bucketing in HIVE.
- Experience developing and Debugging Hive queries.
- Good Experience in Data importing and exporting to Hive and HDFS with Sqoop.
- Experience with Apache Spark, Spark SQL, Spark Streaming.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Hands on experience in in-memory data processing with Apache Spark
- Experience dealing with the file formats like text files, Sequence files, JSON, Parquet, ORC.
- Experienced at performing read and write operations on HDFS file system.
- Experience working with large data sets and making performance improvements.
- Strong knowledge on UNIX/LINUX commands.
- Adequate Knowledge on Scala Language.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
- Highly motivated and committed to the highest levels of professionalism.
- Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
- Proficiency in using SQL to manipulate data, query expressions, join statements, subquery etc.
- Good experience in Scala Programming.
- Detailed oriented professional, ensuring highest level of quality in reports & data analysis.
- Experience in processing the data using Hive QL and Pig Latin scripts for data Analytics.
- Experience in using Spark Streaming programming model for Real-time data processing. for real time data streaming with Kafka for faster data processing
- Experience in designing and developing application in Spark using Scala.
- Experience in using Producer and Consumer API’s of Apache Kafka.
- Experienced in Apache Kafka to collect the logs and error messages across the cluster.
- Experience creating and driving large scale ETL pipelines.
- Good with version control systems like GIT.
- Provide batch processing solution to certain unstructured and large volume of data by using Spark.
- Extending Hive Core functionality by writing UDF’s for Data Analysis. experience in Data Modeling, Data warehousing, ETL Processing, Database Programming ETL Testing and DBMS.
- Advanced written and verbal communication skills.
- Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
- Substantial experience working in a fast paced, agile software development framework and Scrum Principles in an ownership and results oriented culture.
TECHNICAL SKILLS:
HADOOPDetailed Knowledge of Hadoop Components and MapReduce
HIVE
Written complex HQL queries using analytic functions such as RANK, DENSE-RANK, Cumulative Distribution etc. Developed complex join queries.
PIG
Developed optimized complex scripts using Advanced functions such as Co-group, Nested foreach etc
SQOOP
Import data from RDBMS to HDFS
SPARK
Designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
SPARK-SQL
Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 2.1 for Data Aggregation
KAFKA
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
OOZIE
Scheduling different job using oozie script.
SCALA
Spark with Scala
PROFESSIONAL EXPERIENCE:
Hadoop and Spark Developer
Confidential, Framingham, MA
Responsibilities:
- Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
- Developed SQOOP jobs to import data in Avro file format from RDBMS to HDFS and created Hive tables on top of it.
- Performed Spark jobs such as transformations and actions on RDDs using Scala.
- Implemented SparkSQL to access hive tables into Spark for faster processing of data.
- Worked on transforming the queries written in Hive to Spark Application.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed preprocessing job using Spark Data frames to transform JSON documents to flat file.
- Worked with various HDFS file formats like Avro, Sequence File, Parquet and various compression formats.
- Good understanding on NoSQL databases such as HBase and MongoDB.
- Good understanding on Kafka architecture i.e., Topics, Consumers, Producers, Brokers, Partitions and Clusters.
- Provide support data analysts in running Pig and Hive queries.
Environment: Cloudera (5) HDFS, Spark, Hive, Map Reduce, Hue, SQOOP, Flume, Oozie, Putty, SPARK SQL, Scala, Linux, YARN and Agile Methodology.
Hadoop Developer
Confidential, Columbus, OH
Responsibilities:
- Designed and Developed data migration from legacy systems to Hadoop environment.
- Imported the data from RDBMS to HDFS & Hive and performed incremental imports using Sqoop Job in various formats such as Avro, Text and Parquet formats.
- Performed various Sqoop operations such as eval, import, export, job etc.,
- Created External and Managed tables in Hive, loaded the data into tables and processed hive queries that will run internally in map reduce way.
- Pre-processed log data in Pig-Latin by parsing using regular expressions.
- Involved in processing xml & JSON file formats, created partitioning of the data and implemented bucketing in Hive for performance optimization.
- Used JSON, XML and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents.
- Loaded data with complex data types such as Maps, Arrays and Structs into Hive tables.
- Experience in building Pig Latin scripts to extract, transform and load data onto HDFS.
- Developed a workflow in Oozie to automate the task of loading the data into HDFS using Sqoop and processing it with Hive.
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team
- Used Cloudera manager to pull metrics on various cluster features like JVM, Running Map and reduce tasks.
Environment: Hadoop (CDH 5), Hive, Pig, Sqoop, Flume, MapReduce, HDFS, Hue
Hadoop developer
Confidential, Jersey City, NJ
Responsibilities:
- Ingested data from Relational Database into HDFS using SQOOP and processed them using Hive jobs.
- Created data pipeline using Hive, Spark, and HBase to ingest, transform and analyze the customer behavioral data.
- Performed partitioning and bucketing in Hive to improve the performance.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
- Implemented Spark RDD transformations and performed actions to implement business analysis.
- Involved in creating the Data frames.
- Created Spark jobs to write the data to HBase tables.
- Involved in improving performances of Spark jobs at application level.
- Involved in tuning the Hive jobs.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Used Spark for interactive queries, processing of batch data and integration with NoSQL database for huge volume of data.
- Involved in writing the data to HBase tables using Hive and Pig.
- Used Hive scripts to compute aggregations and store them on HBase for low latency applications.
- Collected, aggregated, and moved log data from servers to HDFS using Flume.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers using Agile methodology.
- Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.
Environment: Sqoop, Flume, Hive, HBase, HDFS, YARN, Spark, Cloudera (CDH5), Zookeeper, Shell Scripting, Linux
Hadoop Developer
Confidential
Responsibilities:
- Extracted the data from MySQL into HDFS using Sqoop.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
- Developed Pig-Latin scripts to extract data from the web server output files to load into HDFS.
- Worked on Hue interface for querying the data in Hive and Pig editors.
- Experience in using Sequence File, Avro, Text File and Parquet formats.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Exported the processed data from HDFS to RDBMS using Sqoop Export.
Environment: Hadoop, Sqoop, Hive, Pig, MapReduce, HDFS