Hadoop Developer Resume
5.00/5 (Submit Your Rating)
Phoenix, AZ
SUMMARY:
- A Results - oriented Software Development professional with 6 year’s experience in design and development of Big Data Technologies in highly scalable end-to-end Hadoop Infrastructure to solve business problems which involves building large scale data pipelines, data lakes, data warehouse, real-time analytics and reporting solutions.
- Experience and deep understanding of overall Hadoop ecosystem tools like HDFS, Hive, Sqoop, Oozie, Kafka, Yarn and Spark
- Good knowledge of architecture for new proposal using different cloud technologies, Hadoop ecosystem tools, reporting and modeling tools.
- Experience in distributed big data systems with Hadoop using Cloudera (CDH) and Hortonworks (HDP)
- Experience with Apache Hadoop, Apache Spark for working with Big Data to analyze large data sets efficiently.
- Good knowledge of Hive's analytical functions, extending Hive core functionality by writing custom SQL queries.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Experience with memSQL, Imported data from memsql into spark.
- Experience in handling different file formats like parquet, apache Avro, sequence file, JSON, Spreadsheets, Text files, XML and Flat File Format.
- Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization
- Proficient in working with NoSQL databases such as HBase and MongoDB.
- Experience in fine tuning applications written in Spark and Hive and to improve the overall performance of the pipelines
- Strong understanding of real time streaming technologies Spark and Kafka.
- Good Knowledge on Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
- Experience in writing complex SQL queries, creating reports and dashboards.
- Experienced using Sqoop to import data into HDFS/Hive from RDBMS and exporting data back from HDFS/Hive to RDBMS.
- Possess Hands on experience in Hive Data Modeling. Very good understanding of Partitions, Bucketing concepts in Hive; designed both Managed and External tables in Hive to optimize performance.
TECHNICAL SKILLS:
Languages: Scala, Python, SQL
IaaS: AWS, Google Cloud Platform
Distributed data processing: Hadoop, Spark
Distributed databases: Cassandra, HBase
Distributed query engine: AWS Athena, Hive, Presto
Distributed file systems: HDFS, S3.
Distributed computing environment: Amazon EMR, Hortonworks
Data Ingestion: Kafka, Amazon Kinesis, Firehose.
Relational databases: Oracle, MySQL, IBM DB2, MS SQL Server
Source Control: Git, Subversion
EXPERIENCE:
Confidential - Phoenix, AZ
Hadoop Developer
- Developed Spark applications using Scala utilizing Data frames and spark SQL API for faster processing of data.
- Responsible for building scalable distributed data solutions using Hadoop and Spark.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis.
- Handled importing of data from various data sources, performed transformations using spark, loaded data into Hive.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Used storage format like AVRO to access multiple columnar data quickly in complex queries.
- Used Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Developed a data pipeline using Kafka to store data into HDFS.
- Imported data from our relational data stores to Hadoop using Sqoop.
- Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment.
- Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Worked with BI team to create various kinds of reports using Tableau based on the client needs.
- Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.
Confidential - Atlanta, GA
Hadoop Developer
- Involved in working with Apache Spark API, and Spark streaming, creating queries, to query MapReduce files.
- Implemented Spark RDD transformations and performed actions to implement business analysis.
- Created data pipeline using HBase, Spark, and Hive to ingest, transform and analyze the customer behavioral data
- Experienced with Spark Context, Spark-SQL, Data Frames, Pair RDDs and YARN.
- Wrote complex Map Reduce jobs to perform various data cleansing and ETL like processing on the data .
- Worked on different file formats like Text, Avro, Parquet using Map Reduce Programs.
- Developed Hive Scripts to create partitioned tables and create various analytical datasets.
- Worked with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve client operational and strategic problems.
- Extensively used Hive queries to query data in Hive Tables and loaded data into HBase tables.
- Exported the processed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Used Spark for interactive queries, processing of batch data and integration with NoSQL database for huge volume of data.
- Used Hive Partitioning and Bucketing concepts to increase the performance of Hive Query processing.
- Designed Oozie workflows for job scheduling and batch processing.
- Helped analytics team by writing Hive scripts to perform further detailed analysis of the data processed.
Confidential - New Jersey
Hadoop Engineer
- Extracted the data from MySQL, AWS Redshift into HDFS using Sqoop.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in collecting, aggregating and moving log data from servers to HDFS using Flume.
- Imported and Exported Data from Different Relational Data Sources like DB2, SQL Server, Teradata to HDFS using Sqoop.
- Ingesting the data from legacy and upstream systems to HDFS using apache Sqoop, Flume, Hive queries.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
- Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
- Used Sqoop to efficiently transfer data between databases and Hdfs and used Flume to stream the log data from servers.
- Developed MapReduce programs to cleanse the data in Hdfs obtained from Multiple sources to make it suitable for ingestion into hive schema for analysis.
Confidential, phoenix AZ
Data Analyst
- Performed in depth analysis of data and prepare daily reports by using SQL, MS Excel, MS PowerPoint and share point.
- Created complex SQL queries and scripts to extract and aggregate data to validate the accuracy of the data.
- Prepared high level analysis reports with Excel and Tableau that provided feedback on the quality of Data including identification of patterns and outliers.
- Used Advanced Microsoft Excel techniques such as Pivot tables, VLOOKUP to create intuitive dashboards in Excel
- Created pivot tables and charts using MS Excel worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot table
- Created Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps and Gantt charts.