We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

Atlanta, GA


  • Over 8+ years of strong experience in the IT industry that includes 6 years as a Hadoop and Spark Developer in domains like financial services and Healthcare. Maintained positive communications and working relationship at all levels. An enthusiastic and goal - oriented team player possessing excellent communication, interpersonal skills with good work ethics.sss
  • Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Flume and Hive for scalability, distributed computing and high-performance computing.
  • Experience in using Hive Query Language and Spark for data Analytics.
  • Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
  • Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, ZooKeeper, HBase, Sqoop, Oozie, Flume and spark for data storage and analysis.
  • Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into PigLatin and HQL HiveQL and Used UDFs from Piggybank UDF Repository.
  • Strong knowledge on creating and monitoring Hadoop clusters on VM, Horton Works Data Platform 2.1 & 2.2, CDH5 Cloudera Manager, HDP on Linux, Ubuntu etc.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Supported Web Sphere Application Server WPS, IBM HTTP/ Apache Web Servers in Linux environment for various projects.
  • Having good working Knowledge on Map Reduce Framework.
  • Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop cluster.
  • Using Build tools like Maven to build projects.
  • Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into PigLatin and HQL HiveQL and Used UDFs from Piggybank UDF Repository.
  • Good knowledge on Kafka, Active MQ and Spark Streaming for handling Streaming Data.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Analyse data, interpret results and convey findings in a concise and professional manner
  • Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
  • Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
  • Good Exposure on Data Modelling, Data Profiling, Data Analysis, Validation and Metadata Management.
  • Flexible with Unix/Linux and Windows Environments working with Operating Systems like Cent OS 5/6, Ubuntu 13/14.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Working on different file formats like JSON, XML, CSV, XLS etc.
  • Using Amazon AWSEMR and EC2 for cloud big data processing.
  • Experience in Version Control Tools like Github.
  • Good experience in Generating Statistics and reports from the Hadoop.
  • Have sound knowledge on designing ETL applications with using Tools like Talend.
  • Experience in working with job scheduler like Oozie.
  • Strong in databases like MySQL, Teradata, Oracle, MS SQL.
  • Strong understanding of Agile and Waterfall SDLC methodologies.
  • Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.


Hadoop/Big Data Technologies: HDFS, Map Reduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, SparkQL, and Zookeeper, AWS, Cloudera, Horton works, Kafka, Avro, and Big Query.

Languages: Core Java, XML, HTML and HiveQL.

J2EE Technologies: Servlets, JSP, JMS, JSTL, AJAX, DOJO, JSON and Blaze DS.

Frameworks: Spring 2, Struts 2 and Hibernate 3.

XML Processing: JAXB

Reporting Tools: BIRT 2.2.

Application & Web Services: WebSphere 6.0, JBoss 4.X and Tomcat 5.

Scripting Languages: Java Script, Angular JS, Pig Latin, Python 2.7and Scala.

Database (SQL/No SQL): Oracle 9i, SQL Server 2005, MySQL, HBase and Mongo DB 2.2

IDE: Eclipse and Edit plus.

PM Tools: MS MPP, Risk Management, ESA.

Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, Perforce, JIRA, Bugzilla, Visual Source, QC, Agile Methodology.

EAI Tools: TIBCO 5.6.

Bug Tracking/ Ticketing: Mercury Quality Center and Service Now.

Operating System: Windows 98/2000, Linux /Unix and Mac.


Confidential, Atlanta, GA

Hadoop/Big Data Developer


  • Worked on Hadoop Stack, ETLTOOLS like TALEND, Reporting tools like Tableau and Security like Kerberos, User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, install their custom software s, upgrade Hadoop components, solve their issues, and help them troubleshooting their long running jobs, we are L3 and L4 support for the Datalike, and I also manage clusters for other teams.
  • Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elastic Search, Tableau, GoCD, Redhat infrastructure for data ingestion, processing, and storage.
  • Im a mix of Devops and Hadoop admin here, and work on L3 issues and installing new components as the requirements comes and did as much automation and implemented CI /CD Model.
  • Involved in implementing security on Horton works Hadoop Cluster using with Kerberos by working along with operations team to move non - secured cluster to secured cluster.
  • Responsible for upgrading Horton works Hadoop HDP2.2.0 and MapReduce2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS. Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization.
  • Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
  • Used R for an effective data handling and storage facility Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built. Designing cloud-hosted solutions, specific AWS product suite experience.
  • Performed a Major upgrade in production environment from HDP1.3 to HDP2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari. Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
  • Created Teradata Database Macros for Application Developers which assist them to conduct performance and space analysis, as well as object dependency analysis on the Teradata database platforms
  • Implementing a Continuous Delivery framework using Jenkins, Puppet, Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
  • Defined Chef Server and workstation to manage and configure nodes.
  • Experience in setting up the chef repo, chef work stations and chef nodes.
  • Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Environment: Horton works Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Ambari, SAS, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce and Pig.


Hadoop/Big Data Developer


  • Driving Digital Products in the bank for IOT for campaigning system, Blockchain for payment and Trading etc.
  • Defining Architecture Standards, Big data Principles, and PADS across Program and usage of VP for Modelling.
  • Developed pig scripts to transform data and loaded into HBase tables.
  • Developed Hive scripts for implementing dynamic partitions
  • Created Hive snapshot tables and Hive ORC tables from Hive tables.
  • In the Data Processing Layer data is finally stored in Hive Tables in ORC file format using Spark SQL, in this layer logic for maintaining SCDtype2 is implemented for non - transactional incremental feeds.
  • Development of a Rule engine which would further add columns to existing data based on certain Business Rules specified by Reference Data provided by Business.
  • Optimized hive joins for large tables and developed map reduce code for the full outer join of two large tables.
  • Used spark to parse XML files and extract values from tags and load it into multiple hive tables using map classes.
  • Experience in using HDFS and My SQL and deployed HBase integration to perform OLAP operations on HBase data.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in Map Reduce way.
  • Used TalendBigDataOpenStudio5.6.2 to create framework for executing extract framework
  • Monitored workload, job performance, and capacity planning.
  • Implemented partitioning and bucketing techniques in Hive
  • Used different bigdata components in Talend like throw, thiveInput, tHDFSCopy, tHDFSput, tHDFSGet, tMap, tdenormalize, tFlowtoIterate etc. Scheduled different talend jobs using TAC (Talend Admin Console).
  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like HBase. Developed Map Reduce programs to perform data filtering for unstructured data.
  • Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
  • Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, XML, JSON, and Parquet.
  • Created multi-stage Map-Reduce jobs in Java for ad-hoc purposes
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions
  • Adding/installation of new components and removal of them through Ambari.
  • Collaborating with application teams to install the operating system and Hadoop updates, patches, version upgrades.

Environment: Hadoop, Map Reduce, TAC, HDFS, HBase, HDP Horton, Sqoop, SparkSQL, Hive ORC, Data Processing Layer, HUE, AZURE, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron, JSON, XML, Parquet.

Confidential, Tampa, Florida

Hadoop/Big Data Developer


  • Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
  • Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, space time.
  • Implemented end - to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop, and MongoDB.
  • Involvement in Test data preparation using Blackbox testing Techniques (Like BVA, ECP.)
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Perform a proper EDA, Univariate and bivariate analysis to understand the intrinsic effect/combined effects.
  • Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for Forwarding/Reverse Engineered Databases.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
  • Coded R functions to interface with Caffe Deep Learning Framework
  • Performed data cleaning and imputation of missing values using R.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce
  • Take up ad-hoc requests based on different departments and locations
  • Used Hive to store the data and perform data cleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau
  • Creating customized business reports and sharing insights to the management
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Working in Amazon Web Services cloud computing environment
  • Wrote complex HIVEQL and pig scripts queries that will pull data as per the requirement, to perform data validation against report o/p.
  • Come up with data load and security strategies and workflow designs with the help of administrators and other developers.
  • Used Tableau to automatically generate reports.
  • Worked with partially adjudicated insurance flat files, internal records, 3rdparty data sources, JSON, XML and more.
  • Developed automated workflow to schedule the jobs using Oozie
  • Developed a technique to incrementally update HIVE tables (a feature currently not supported by HIVE).
  • Created metrics and executed unit tests on input, output and intermediate data
  • Lead the testing team and meetings with onshore for requirement gathering
  • Assist the team in creating documents that entail the process involved in cluster set up
  • Established Data architecture strategy, best practices, standards, and roadmaps.
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team.
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.

Environment: Hadoop, Map Reduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.


Hadoop/Big Data Developer.


  • Developed multiple Map - Reduce jobs in java for data cleaning and preprocessing.
  • Performed Map Reduce Programs those are running on the cluster.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Configured Hadoop cluster with Name node and slaves and formatted HDFS.
  • Performed Importing and exporting data from Oracle to HDFS and Hive using Sqoop
  • Performed source data ingestion, cleansing, and transformation in Hadoop.
  • Supported Map-Reduce Programs running on the cluster.
  • Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to the database.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving the performance of existing Pig and Hive Queries.
  • Involved in developing Hive UDFs and reused in some other requirements. Worked on performing Join operations.
  • Developed Serde classes.
  • Developed fingerprinting rules on HIVE which help in uniquely identifying a driver profile
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Used Hive to partition and bucket data.

Environment: Hadoop, Map Reduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.

Hire Now