Hadoop Developer Resume
Phoenix, AZ
SUMMARY:
- Offering over 6 years of overall IT experience with 5 Years of experience in Application integration and management in Cloud and Big Data.
- Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase - Hive, Integration, PIG, Sqoop, Flume, Oozie, Zookeeper & knowledge of Mapper/Reduce/HDFS Framework.
- Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.
- Knowledge of NO SQL databases like MongoDB and Cassandra.
- Have experience in Shell Scripting and used it extensively with Spark for data processing.
- Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
- Experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase.
- Analyzing Data through Hive QL, Pig Latin & MapReduce programs in Java. Extending HIVE and PIG core functionalities by implementing custom UDF’s.
- Experience in AWS services like EMR, EC2 and S3.
- Good exposure and experience in Spark, Scala, Big Data and AWS Stack.
- Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark and Scala.
- Hands on experience on Cloudera & MapR Hadoop environments.
- Experience in writing Maven and SBT scripts to build and deploy Java and Scala Applications.
- Good understanding of Hadoop administration with Cloudera & MapR.
- Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
- Involved in production monitoring using workflow monitor and experience in development and support environments.
- Experienced in using waterfall, Agile and Scrum models of software development process framework.
- Strong knowledge of version control systems like SVN & GIT.
TECHNICAL SKILLS:
Hadoop: HDFS, Spark, Flume, Kafka, Oozie.
Hadoop Distributions: Cloudera, Hortonworks, Azure HDINSIGHT.
Languages: Java, Scala, Python, LINUX Shell Scripting, AZURE PowerShell.
Scripts: JavaScript, Shell Scripting.
Database: Oracle 10g, MySQL, MSSQL.
No SQL Database: HBase, Cassandra, MongoDB.
Web Servers: Apache Tomcat.
Operating Systems: Windows, Linux (Cent OS, Ubuntu).
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Hadoop Developer
Roles & Responsibilities:
- Developed various main & service classes through Scala using spark SQLs for the requirement specific tasks.
- Cluster size is scalable and with Band width of 40GB nodes with a twostep process namely Data Ingestion flowed by Data Processing.
- Hands on coding with Scala for leveraging Apache Spark through the Scala APIs.
- Performed Data preprocessing and data cleaning using Hive & Pig.
- Populating HBase tables through automation. Used Spark Shell for querying database.
- Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
- Veracity of Data from various sources: Sequence file, Avro, Text, Hive Tables, Batch Streams and Logfiles.
- Wrapper scripts in Unix Shell for automation using shell scripting.
- Build Tools used were Maven and SBT.
- Involved in Data Validation and fixing discrepancies by working in coordination with the Data Integration.
- Implemented SPARK batch jobs.
- Developing the Tasks and setting up the requirement environment for running Hadoop in cloud on various instances.
- Automated complex workflow schedulers using the Oozie workflow scheduler.
Environment: MapR Hadoop Distribution, Hive, Python, HBase, Sqoop, Maven builds, Spark, Spark SQL, Oozie, Linux/Unix, Shell Scripting, GIT.
Confidential, Irving, TX
Hadoop Developer
Roles & Responsibilities:
- Worked in a team with 30-node cluster and increased cluster by adding Nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
- Daily Monitoring of Cluster status and health using Hue UI.
- Created data importing pipelines from the MySQL and Oracle into the HDFS using Sqoop.
- Stored data in AWS S3 like HDFS. Also performed EMR operations on data stored in S3.
- Scripted AWS environments using secured VPC and different data pipelines and Redshift cluster.
- Written Hadoop MapReduce jobs using Java for processing data on HDFS.
- Written Spark applications utilizing Spark-Core, Data frames, Spark-SQL using Scala.
- Import the data from HDFS/HBase into Spark RDD.
- Created Hive tables and implemented partitioning, dynamic partitions, buckets and created external tables to optimize performance.
- Extensively Involved in loading data from UNIX file system to HDFS.
- Involved in evaluating the business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Performed CRUD operations in HBase.
- Developed Hive queries to process the data.
- Oozie for automating events for Data Ingestion and Processing.
- Generated aggregations and groups and visualizations using Tableau.
Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Oozie, HBase, Yarn, Spark, Tableau, Cloudera Manager.
Confidential, Bentonville, AR
Hadoop Developer
Roles & Responsibilities:
- Involved in all phases of Installation and upgradation of Hadoop big data platform. Implementing security for Hadoop big data platform
- Designed the sequence diagrams to depict the data flow into Hadoop.
- Involved in importing and exporting data between HDFS and Relational Systems like Oracle, MySQL and DB2 using Sqoop.
- Prepare SOPs for product installations, upgrades and any other new process. Analyze encryption methodologies and implement them in the environment
- Setup best practices for monitoring. Analyze Hardware, Software requirements for the projects
- Help Application and Operations team to troubleshoot the performance issues
- Implemented Partitioning, Dynamic Partitions and bucketing in HIVE for efficient data access.
- Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
- Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
- Involve in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Implement Partitioning, Dynamic Partitions, Buckets in HIVE.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Environment: Hadoop, HDFS, Pig, Hive, Spark, MapReduce, Java.
Confidential
Hadoop Engineer
Roles & Responsibilities:
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required. ETL & BI concepts, testing methodologies
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Designed a data warehouse using Hive.
- Using Hive, Map-reduce, and loaded data into HDFS.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Extensively worked on SQOOP for importing metadata from Oracle.
- Extensively used Pig for data cleansing.
- Created partitioned tables in Hive.
- Worked with business teams and created Hive queries for ad hoc access.
- Evaluated usage of Oozie for Workflow Orchestration.
- Mentored analyst and test team for writing Hive Queries.
- Wrote MapReduce programs with Java API to cleanse Structured and unstructured data.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Launched Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, AWS, Java, Oozie, MySQL
Confidential
Associate Engineer
Roles & Responsibilities:
- Created database objects like tables, views, indexes, stored-procedures, triggers, and user defined functions
- Written stored procedures and SQL scripts in SQL server implement business rules for various clients
- Written T- SQL queries for the retrieval of the data
- Writing and debugging T- SQL, stored procedures, views and user defined functions
- Data migration (import & export - BCP) from text to SQL Server
- Error handling using Try-Catch Block
- Normalization and De-Normalization of tables
- Developed backup and restore scripts for SQL Server 2008
- Installed and configured SQL Server 2008 with latest service packs
- Customized the stored procedures and database triggers to meet the changing business rules
- Implemented indexes for performance tuning.
- Wrote Triggers and Stored Procedures and T- SQL Queries to capture updated and deleted data from OLTP systems
- Designed data models using Erwin. Developed physical data models and created DDL scripts to create database schema and database objects
- Wrote T- SQL queries using inner join, outer join, and self joins, merge join. And implemented functionality for removing duplicate records like using CTE and ranking function.
Environment: MS SQL Server 2008, T- SQL, DTS, SQL Server Enterprise Manager, SQL Profiler.
Environment: SSIS/SSRS, T- SQL, SQL Server 2008, MS Excel MS Office 2007, Windows 7.