We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Atlanta, GA


  • 8+ years of professional experience in IT industry, with 3 years' experience in Hadoop ecosystem's implementation, maintenance, ETL and Big Data analysis operations.
  • Experience in installation, configuration, monitoring and administration of Hadoop ecosystems such as Yarn, MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Oozie, Flume, Spark, Zookeeper and Solr for data storage and analysis.
  • Extensive experience in cluster planning, installing, configuring and administering Hadoop cluster for major Hadoop distributions like Cloudera and Hortonworks.
  • Experience in running Hadoop streaming jobs to process Terabytes of xml and /or JSON format data.
  • In - depth knowledge of NoSQL Technologies such as HBase, CouchDB, Cassandra and Mongo DB
  • In-depth knowledge of modifications required in static IP (interfaces), hosts and bashrc files, setting up password-less SSH and Hadoop configuration for Cluster setup and maintenance.
  • Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and MapReduce.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop
  • Hands on Experience in writing complex Map Reduce programs using Python to perform analytics based on different common patterns including Joins, Sampling, Data Organization, filtering and Summarization
  • Strong Knowledge of Amazon Web Services and Microsoft Azure
  • Experience in real-time monitoring and alerting of applications deployed in AWS using Cloud watch, Cloud trail and Simple Notification Service
  • Experience in provisioning highly available, fault tolerant and scalable applications using AWS Elastic Beanstalk, Amazon RDS, Elastic Load Balancing, Elastic Map Reduce and Auto Scaling.
  • Experience in building policies for access control and user profiles using AWS IAM, S3 controls with bucket policies.
  • Good understanding on building Bigdata/Hadoop applications using AWS Services like Amazon S3, EMRFS, EMR, RDS
  • Hands on experience in creating real-time data streaming solutions using Apache Spark and Python using PySpark
  • Experience in handling messaging services using Apache Kafka
  • Experienced working with different file formats - Avro, Sequence and JSON
  • Good knowledge and experience on Microsoft Business Intelligence (SSIS, SSRS, SSAS)
  • Strong analytical and problem-solving skills with excellent oral and written communication skills.


Hadoop Ecosystems: MapReduce v1, YARN, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Flume, Storm, Cassandra

Programming Languages: T-SQL, Pl/SQL. PostgreSQL, C, C++, Java.

Scripting Languages: JavaScript, Python, Power shell


Tools: Eclipse, MS Visual Studio

Platforms: Windows, Linux

NoSQL Technologies: CouchDB, Cassandra, Dynamo DB

Application Servers: Apache Tomcat 5.x 6.0, Jboss 4.0

Testing Tools: NetBeans, Eclipse

Server Tools:: SSMS, SSRS, SSIS, Database Tuning Advisor (DTA), SQL Profiler, DMV.


Confidential, Atlanta, GA



  • Developed and executed custom MapReduce programs, PigLatin scripts and HQL queries.
  • Used Hadoop FS scripts for HDFS (Hadoop Distributed File System) data loading and manipulation.
  • Analyzed business requirements and cross-verified them with functionality and features of NoSQL databases like HBase, Cassandra to determine the optimal DB.
  • Monitored workload, job performance and node health using Cloudera Manager.
  • Used Flume to collect and aggregate weblog data from different sources and pushed to HDFS.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive, Pig and Sqoop.
  • Developed Pig UDFs to pre-process data for analysis
  • Worked with business teams and created Hive queries for ad hoc access.
  • Responsible for creating Hive tables, partitions, loading data and writing hive queries.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data
  • Maintained cluster co-ordination services through ZooKeeper.
  • Generated summary reports utilizing Hive and Pig and exported these results via Sqoop for Business reporting and Intelligence analysis
  • Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
  • Developed UDF's using both Data Frames/SQL and RDD in Spark for data Aggregation queries and reverting back into OLTP through SQOOP.
  • Building and managing large-scale private cloud deployments
  • Provisioning of S3 storage strategies such as versioning, life cycle policies, cross region replication and Glacier using command line interface (CLI) and Amazon Management Console.
  • Created RDS DB instances using Multi-AG deployments. Tested DB instance Failover using reboot with fail-over.
  • Built customized Amazon Machine Images (AMI), deploy AMIs to multiple regions and launch EC2 instances using these custom images.

Confidential, Atlanta, GA



  • Installed and configured Hadoop Tools such as HDFS, Map Reduce, Yarn, Hive, Pig, SQOOP, Flume, Kafka and Oozie.
  • Loaded data from LINUX file system to HDFS. Imported and exported data into HDFS and Hive using SQOOP, processed data in HDFS using Impala (in Hue Interface).
  • Processing and analyze the data using MapReduce jobs
  • Created HIVE tables to store the processed results in tabular format
  • Designed and implemented Apache Spark streaming application
  • Worked on pulling the data from MySQL database into HDFS using Sqoop
  • Collecting and aggregating large amounts of log data using Flume and staging data HDFS for further analysis, Developed Map Reduce Programs those are running on the cluster, processed unstructured data using Pig and Hive. Assisted in monitoring Hadoop cluster using tools like SSIS.
  • Implemented test Shell scripts to support test driven development and continuous integration, scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Dumped data to Cassandra using Kafka, Created Cassandra tables to store various data formats data came from different portfolios.
  • Involved in ETL large datasets (Terabytes) of structured, semi-structured and unstructured data
  • Responsible for running Hadoop streaming jobs to process Terabytes of xml's data utilized cluster co-ordination services through Zookeeper.
  • Involved in developed PIG scripts to extract the data from the web server and did transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Dumped Online transfer data to HBASE using Kafka. Handled imported data from HBASE and performed transformations using Hive, Created Hive External tables and loaded the data in to tables and query data using HQL stored onto HDFS.

Confidential, Atlanta, GA

Junior Database Administrator


  • Tasks involved upgrading from SQL 2005 to SQL 2008, backup and restores procedures, migrating databases, testing replication, performance monitoring to resolve bottlenecks, resolving security issues at the enterprise level and locally, analyzing data via T-SQL queries, log shipping, utilizing DMV’s for analyses, initiating the SQL profiler to monitor and measure queries, index optimizing, and monitoring SQL Servers to increase performance.
  • Involved in capacity planning, sizing and database growth projections
  • Scheduled Full and Transactional log backups for the user created and system database in the production environment using the Database Maintenance Plan wizard
  • Used Data Transformation Services(DTS)/SQL Integration Services (SSIS) and Extract Transform Loading (ETL) tool of SQL Server to populate data from data sources, creating packages to different data loading operations for application.
  • Involved in maintaining, monitoring, and troubleshooting SQL Server performance issues.
  • Assisted other DBA in installing, testing and deploying SSRS for reporting.
  • Worked closely with the network administrator and senior developer in resolving issues related to capacity planning and procedures enhancing overall performance.
  • Responsible for implementing new methods of automating various maintenance processes related to the SQL Server environment.
  • Involved in tasks like re-indexing, checking data integrity, backup and recovery.
  • Experienced with T-SQL (in writing procedures, triggers and functions)

Confidential, Midland TX

Junior Database Administrator

  • Involved in capacity planning, sizing and database growth projections
  • Scheduled and maintain routine Jobs, Alerts and Maintenance Plans.
  • Created and managing Users, Roles and Groups and handling database security
  • Managing Backups daily and Recovery as per request.
  • Reviewing SQL Server and SQL Server Agent Error Logs.
  • Troubleshooting the HA issues of Log shipping and Clustering.
  • Creating SSIS package for keeping data in sync between different domains; also, to migrate data from different heterogenous environments.
  • Creation and modification of ETL packages using SSIS, DTS
  • Developing and Managing SQL Server Reporting Services based on business requirement.


Computer Engineer

  • Designed, installed, and configured for small businesses and homes fundamental networks for sharing resources, upgrading security issues, installing routers and hubs and troubleshooting connection problems.
  • Additionally, created small databases using Microsoft Access for small businesses and homes.
  • Taught clients how to troubleshoot, configure and manage their small computer environments.
  • Taught students how to write Computer Programming Language.

Hire Now