We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY:

  • A Results - oriented Software Development professional with 6 year’s experience in design and development of Big Data Technologies in highly scalable end-to-end Hadoop Infrastructure to solve business problems which involves building large scale data pipelines, data lakes, data warehouse, real-time analytics and reporting solutions.
  • Experience and deep understanding of overall Hadoop ecosystem tools like HDFS, Hive, Sqoop, Oozie, Kafka, Yarn and Spark
  • Good knowledge of architecture for new proposal using different cloud technologies, Hadoop ecosystem tools, reporting and modeling tools.
  • Experience in distributed big data systems with Hadoop using Cloudera (CDH) and Hortonworks (HDP)
  • Experience with Apache Hadoop, Apache Spark for working with Big Data to analyze large data sets efficiently.
  • Good knowledge of Hive's analytical functions, extending Hive core functionality by writing custom SQL queries.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Experience with memSQL, Imported data from memsql into spark.
  • Experience in handling different file formats like parquet, apache Avro, sequence file, JSON, Spreadsheets, Text files, XML and Flat File Format.
  • Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization
  • Proficient in working with NoSQL databases such as HBase and MongoDB.
  • Experience in fine tuning applications written in Spark and Hive and to improve the overall performance of the pipelines
  • Strong understanding of real time streaming technologies Spark and Kafka.
  • Good Knowledge on Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
  • Experience in writing complex SQL queries, creating reports and dashboards.
  • Experienced using Sqoop to import data into HDFS/Hive from RDBMS and exporting data back from HDFS/Hive to RDBMS.
  • Possess Hands on experience in Hive Data Modeling. Very good understanding of Partitions, Bucketing concepts in Hive; designed both Managed and External tables in Hive to optimize performance.

TECHNICAL SKILLS:

Languages: Scala, Python, SQL

IaaS: AWS, Google Cloud Platform

Distributed data processing: Hadoop, Spark

Distributed databases: Cassandra, HBase

Distributed query engine: AWS Athena, Hive, Presto

Distributed file systems: HDFS, S3.

Distributed computing environment: Amazon EMR, Hortonworks

Data Ingestion: Kafka, Amazon Kinesis, Firehose.

Relational databases: Oracle, MySQL, IBM DB2, MS SQL Server

Source Control: Git, Subversion

EXPERIENCE:

Confidential - Phoenix, AZ

Hadoop Developer

  • Developed Spark applications using Scala utilizing Data frames and spark SQL API for faster processing of data.
  • Responsible for building scalable distributed data solutions using Hadoop and Spark.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis.
  • Handled importing of data from various data sources, performed transformations using spark, loaded data into Hive.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Used storage format like AVRO to access multiple columnar data quickly in complex queries.
  • Used Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Imported data from our relational data stores to Hadoop using Sqoop.
  • Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment.
  • Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Worked with BI team to create various kinds of reports using Tableau based on the client needs.
  • Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.

Confidential - Atlanta, GA

Hadoop Developer

  • Involved in working with Apache Spark API, and Spark streaming, creating queries, to query MapReduce files.
  • Implemented Spark RDD transformations and performed actions to implement business analysis.
  • Created data pipeline using HBase, Spark, and Hive to ingest, transform and analyze the customer behavioral data
  • Experienced with Spark Context, Spark-SQL, Data Frames, Pair RDDs and YARN.
  • Wrote complex Map Reduce jobs to perform various data cleansing and ETL like processing on the data .
  • Worked on different file formats like Text, Avro, Parquet using Map Reduce Programs.
  • Developed Hive Scripts to create partitioned tables and create various analytical datasets.
  • Worked with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve client operational and strategic problems.
  • Extensively used Hive queries to query data in Hive Tables and loaded data into HBase tables.
  • Exported the processed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Used Spark for interactive queries, processing of batch data and integration with NoSQL database for huge volume of data.
  • Used Hive Partitioning and Bucketing concepts to increase the performance of Hive Query processing.
  • Designed Oozie workflows for job scheduling and batch processing.
  • Helped analytics team by writing Hive scripts to perform further detailed analysis of the data processed.

Confidential - New Jersey

Hadoop Engineer

  • Extracted the data from MySQL, AWS Redshift into HDFS using Sqoop.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in collecting, aggregating and moving log data from servers to HDFS using Flume.
  • Imported and Exported Data from Different Relational Data Sources like DB2, SQL Server, Teradata to HDFS using Sqoop.
  • Ingesting the data from legacy and upstream systems to HDFS using apache Sqoop, Flume, Hive queries.
  • Involved in creating Hive tables, loading with data and writing Hive queries that will run internally.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
  • Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
  • Used Sqoop to efficiently transfer data between databases and Hdfs and used Flume to stream the log data from servers.
  • Developed MapReduce programs to cleanse the data in Hdfs obtained from Multiple sources to make it suitable for ingestion into hive schema for analysis.

Confidential, phoenix AZ

Data Analyst

  • Performed in depth analysis of data and prepare daily reports by using SQL, MS Excel, MS PowerPoint and share point.
  • Created complex SQL queries and scripts to extract and aggregate data to validate the accuracy of the data.
  • Prepared high level analysis reports with Excel and Tableau that provided feedback on the quality of Data including identification of patterns and outliers.
  • Used Advanced Microsoft Excel techniques such as Pivot tables, VLOOKUP to create intuitive dashboards in Excel
  • Created pivot tables and charts using MS Excel worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot table
  • Created Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps and Gantt charts.

We'd love your feedback!