Hadoop Developer Resume Phoenix, AZ - Hire IT People

SUMMARY:

A Results - oriented Software Development professional with 6 year’s experience in design and development of Big Data Technologies in highly scalable end-to-end Hadoop Infrastructure to solve business problems which involves building large scale data pipelines, data lakes, data warehouse, real-time analytics and reporting solutions.
Experience and deep understanding of overall Hadoop ecosystem tools like HDFS, Hive, Sqoop, Oozie, Kafka, Yarn and Spark
Good knowledge of architecture for new proposal using different cloud technologies, Hadoop ecosystem tools, reporting and modeling tools.
Experience in distributed big data systems with Hadoop using Cloudera (CDH) and Hortonworks (HDP)
Experience with Apache Hadoop, Apache Spark for working with Big Data to analyze large data sets efficiently.
Good knowledge of Hive's analytical functions, extending Hive core functionality by writing custom SQL queries.
Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
Experience with memSQL, Imported data from memsql into spark.
Experience in handling different file formats like parquet, apache Avro, sequence file, JSON, Spreadsheets, Text files, XML and Flat File Format.
Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization
Proficient in working with NoSQL databases such as HBase and MongoDB.
Experience in fine tuning applications written in Spark and Hive and to improve the overall performance of the pipelines
Strong understanding of real time streaming technologies Spark and Kafka.
Good Knowledge on Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
Experience in writing complex SQL queries, creating reports and dashboards.
Experienced using Sqoop to import data into HDFS/Hive from RDBMS and exporting data back from HDFS/Hive to RDBMS.
Possess Hands on experience in Hive Data Modeling. Very good understanding of Partitions, Bucketing concepts in Hive; designed both Managed and External tables in Hive to optimize performance.

TECHNICAL SKILLS:

Languages: Scala, Python, SQL

IaaS: AWS, Google Cloud Platform

Distributed data processing: Hadoop, Spark

Distributed databases: Cassandra, HBase

Distributed query engine: AWS Athena, Hive, Presto

Distributed file systems: HDFS, S3.

Distributed computing environment: Amazon EMR, Hortonworks

Data Ingestion: Kafka, Amazon Kinesis, Firehose.

Relational databases: Oracle, MySQL, IBM DB2, MS SQL Server

Source Control: Git, Subversion

EXPERIENCE:

Confidential - Phoenix, AZ

Hadoop Developer

Developed Spark applications using Scala utilizing Data frames and spark SQL API for faster processing of data.
Responsible for building scalable distributed data solutions using Hadoop and Spark.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis.
Handled importing of data from various data sources, performed transformations using spark, loaded data into Hive.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Used storage format like AVRO to access multiple columnar data quickly in complex queries.
Used Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
Developed a data pipeline using Kafka to store data into HDFS.
Imported data from our relational data stores to Hadoop using Sqoop.
Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment.
Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Worked with BI team to create various kinds of reports using Tableau based on the client needs.
Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.

Confidential - Atlanta, GA

Hadoop Developer

Involved in working with Apache Spark API, and Spark streaming, creating queries, to query MapReduce files.
Implemented Spark RDD transformations and performed actions to implement business analysis.
Created data pipeline using HBase, Spark, and Hive to ingest, transform and analyze the customer behavioral data
Experienced with Spark Context, Spark-SQL, Data Frames, Pair RDDs and YARN.
Wrote complex Map Reduce jobs to perform various data cleansing and ETL like processing on the data .
Worked on different file formats like Text, Avro, Parquet using Map Reduce Programs.
Developed Hive Scripts to create partitioned tables and create various analytical datasets.
Worked with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve client operational and strategic problems.
Extensively used Hive queries to query data in Hive Tables and loaded data into HBase tables.
Exported the processed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Used Spark for interactive queries, processing of batch data and integration with NoSQL database for huge volume of data.
Used Hive Partitioning and Bucketing concepts to increase the performance of Hive Query processing.
Designed Oozie workflows for job scheduling and batch processing.
Helped analytics team by writing Hive scripts to perform further detailed analysis of the data processed.

Confidential - New Jersey

Hadoop Engineer

Extracted the data from MySQL, AWS Redshift into HDFS using Sqoop.
Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
Responsible for building scalable distributed data solutions using Hadoop.
Involved in collecting, aggregating and moving log data from servers to HDFS using Flume.
Imported and Exported Data from Different Relational Data Sources like DB2, SQL Server, Teradata to HDFS using Sqoop.
Ingesting the data from legacy and upstream systems to HDFS using apache Sqoop, Flume, Hive queries.
Involved in creating Hive tables, loading with data and writing Hive queries that will run internally.
Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
Used Sqoop to efficiently transfer data between databases and Hdfs and used Flume to stream the log data from servers.
Developed MapReduce programs to cleanse the data in Hdfs obtained from Multiple sources to make it suitable for ingestion into hive schema for analysis.

Confidential, phoenix AZ

Data Analyst

Performed in depth analysis of data and prepare daily reports by using SQL, MS Excel, MS PowerPoint and share point.
Created complex SQL queries and scripts to extract and aggregate data to validate the accuracy of the data.
Prepared high level analysis reports with Excel and Tableau that provided feedback on the quality of Data including identification of patterns and outliers.
Used Advanced Microsoft Excel techniques such as Pivot tables, VLOOKUP to create intuitive dashboards in Excel
Created pivot tables and charts using MS Excel worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot table
Created Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps and Gantt charts.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Phoenix, AZ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship