We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

Fort Collins, Co


  • Overall 8+ years of proactive IT experience in Analysis, Design, Development, Implementation, and Testing of software applications which includes an accomplished almost 4+ Years of experience in Bigdata, Development, and Design of Java based enterprise applications.
  • Leveraged strong skills in developing applications involving Bigdata technologies like Hadoop, Yarn, Flume, Hive, Pig, Sqoop, HBase, Cloudera, MapR, Avro, Spark and Scala.
  • Extensively worked on major components of Hadoop Ecosystem like HDFS, HBase, Hive, Sqoop, PIG.
  • Developed various scripts, numerous batch jobs to schedule various Hadoop programs.
  • Experience in analysing data using HiveQL, and custom programs in Java.
  • Hands on experience in importing and exporting data from different databases like Oracle, MySQL, into HDFS and Hive using Sqoop.
  • Implemented Flume for collecting, aggregating and moving a large number of server logs and streaming data to HDFS.
  • Hands on experience in spark, Scala, and Mark logic.
  • Extensively used Design Patterns to solve complex programs.
  • Developed Hive queries for data analysis to meet the business requirements.
  • Experience in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
  • Experienced implementing Security mechanism for Hive Data.
  • Extensively used ETL methodology for performing Data Profiling, Data Migration, Extraction, Transformation and loading using Talend/SSIS and designed data conversions from a wide variety of source systems.
  • Extensively created mappings in Talend using tMap, tJoin, tReplicate, tParallelize, tJava, tJavarow, tDie, tAggregateRow, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap etc.
  • Experience with Hive Queries Performance Tuning.
  • Experienced in improving data cleansing process using Pig Latin operations, transformations and join operations.
  • Extensive knowledge of NoSQL database like HBase.
  • Experienced in performing CRUD operations using HBase Java Client API and Rest API.
  • Experience in designing both time drove and data driven automated workflows using Bedrock and Talend.
  • Good knowledge in creating PL/SQL Stored Procedures, Creating indexes, Packages, Functions, Triggers, Cursors with Oracle (9i, 10g, 11g), and MySQL server.
  • Expert in designing and writing on - demand UNIX shell scripts.
  • Extensively worked with Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
  • Excellent Java development skills using J2EEFrameworks like Struts, EJBs and Web Services.
  • Proficient in development methodologies such as Scrum, Agile, and Waterfall.
  • Passion to excel in any assignment and have good debugging and problem-solving skills.
  • Ability to work under high pressure and close deadlines.
  • Excellent adaptability, ability to learn, good analytical and programmatic skills.


Hadoop/Big Data Technologies: HDFS, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, SparkQL, and Zookeeper, AWS, Cloudera, Hortonworks, Kafka, Avro, BigQuery.

Languages: Core Java, XML, HTML and HiveQL.

J2EE Technologies: Servlets, JSP, JMS, JSTL, AJAX, DOJO, JSON and Blaze DS.

Frameworks: Spring 2, Struts 2 and Hibernate 3.

Application & Web Services: WebSphere 6.0, JBoss 4.X and Tomcat 5.

Scripting Languages: Java Script, Angular JS, Pig Latin, Python 2.7and Scala.

Database (SQL/No SQL): Oracle 9i, SQL Server 2005, MySQL, HBase and Mongo DB 2.2

IDE: Eclipse and Edit plus.

PM Tools: MS MPP, Risk Management, ESA.

Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, Perforce, JIRA, Bugzilla, Visual Source, QC, Agile Methodology.

EAI Tools: TIBCO 5.6.

Bug Tracking/ Ticketing: Mercury Quality Centre and Service Now.

Operating System: Windows 98/2000, Linux /Unix and Mac.


Confidential, Fort Collins, CO.

Hadoop/Big Data Developer


  • Translation of functional and technical requirements into detailed architecture and design.
  • Developed automated scripts for all jobs starting from pulling the data from Mainframes to HDFS system.
  • Developed and Designed ETL Applications and Automated using Oozie workflows and Shell scripts with error handling and mailing System.
  • Implemented nine nodes CDH4Hadoop cluster on Ubuntu LINUX.
  • Implemented programs by joining data sets from different sources using joins.
  • Optimized s programs by configuring configurationally parameters and implemented optimized joins.
  • Implemented map reduce solutions like Top-K, summarizations, data partitions using design patterns.
  • Implemented map reduce programs to handle different file formats like XML, Avro, sequence files and implemented compression techniques.
  • Developed hive queries according to business requirement.
  • Designed/created Hive Internal tables, partitions to store structured data.
  • Developed Hive custom UDF's to in corporate business logics into Hive Queries.
  • Used Hive Optimized file formats like ORC formats and Parquet formats.
  • Implemented Hive Serializes, desterilizes to handle Avro files, used Xpath expressions to handle XML files.
  • Importing & exporting data from RDBMS through Sqoop.
  • Designed Cassandra data modelling to analyse near real time analysis.
  • Configured Cassandra clustered, v-nodes, replication strategies, and configured data model using data stax community.
  • Ensured NFS is configured for Name Node.
  • Designed/Implemented time series data analysis using Cassandra file system.
  • Implemented CRUD operation on top of Cassandra data using CQL and Rest API.
  • Implemented data import/export data from structured data using Sqoop import/export options.
  • Implemented Sqoop saved jobs, incremental imports to import data.
  • Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS.
  • Used cloud era manager to perform cluster monitoring, debug jobs, handle job submission on the cluster.
  • Successfully migrated Legacy application to Bigdata application using Hive/Pig in Production level.

Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Oozy, Cloudera, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper.

Confidential, Boston, MA.

Hadoop/Big Data Developer


  • Involved in all phases of development activities from requirements collection to production support.
  • Detailed understanding of the current system and find out the different sources of data for EMR.
  • Involved in a Cluster setup.
  • Performed Batch processing of logs from various data sources using
  • Automated job Cloudera submission Via Jenkins scripts and Chef.
  • Predictive analytics (which can monitor inventory levels and ensure product availability)
  • Analysis of customers' purchasing behaviours in JMS.
  • Response to value-added services based on clients' profiles and purchasing habits
  • Worked on gathering and refining requirements, interviewing business users to understand and document data requirements including elements, entities, and relationships, in addition to visualization and report specifications.
  • Defined UDFs using PIG and Hive in order to capture customer behaviour.
  • Design and implement map reduce jobs to support distributed processing using java, Hive, Spark SQL and Apache Pig, Oozie.
  • Integrated Apache Kafka for data ingestion.
  • Create Scala, Hive external tables on the output before partitioning, bucketing is applied on top of it.
  • Providing Shell scripting graphs in order to show the trends
  • Maintenance of data importing scripts using HBase, Hive and jobs
  • Developed and maintain several batch jobs to run automatically depending on business requirements
  • Import and export data between the environments like MySQL, HDFS, and Unit testing and Deploying for internal usage monitoring performance of the solution.

Environment: Apache Hadoop, Cloudera, RHEL, Hive, HBase, PIG, HDFS, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume and Cloudera Distribution, Oracle, Teradata, and MySQL.

Confidential, Provo, Utah

Hadoop Developer


  • Detailed Understanding of the existing build system, Tools related for information on various products and releases and test results information
  • Designed and implemented jobs to support distributed processing using java, Hive, and Apache Pig.
  • Developed UDF's to provide custom hive and pig capabilities using SOAP/RESTful services.
  • Built a mechanism for talend, automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
  • Implemented a prototype to integrate PDF documents into a web application using GitHub.
  • Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation
  • Performed Scala, Data transformations in Scala, HIVE and used partitions, buckets for performance improvements.
  • Written custom Input format and record reader classes for reading and processing the binary format in.
  • Written Custom writable classes for Hadoop serialization and De serialization of Time series tuples.
  • Implemented Custom File loader for Pig so that we can query directly on the large Data files such as build logs
  • Used Python for pattern matching in build logs to format errors and warnings
  • Developed Pig Latin scripts & Shell script for validating the different query modes in Historian.
  • Created Hive external tables on the output before partitioning bucketing is applied on top of it.
  • Improved the Performance by Scala, tuning of HIVE and map reduce using talend, Active MQ, and JBoss.
  • Developed Daily Test engine using Python for continuous tests.
  • Used Shell scripting for Jenkins job automation with Talend.
  • Building a custom calculation engine which can be programmed according to user needs.
  • Ingestion of data into Hadoop using Shell scripting for Scrum, Elastic Sqoop and apply data transformations and using Pig and HIVE.
  • Handled the performance improvement changes to Pre-Ingestion service which is responsible for generating the Bigdata Format binary files from an older version of Historian.
  • Worked with support teams and resolved operational & performance issues
  • Research, Scrum, evaluate and utilize new technologies/tools/frameworks around Hadoop eco system
  • Prepared graphs from test results posted to MIA.

Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Cloudera, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume, Oracle, My SQL, and CDH4.X.

Confidential, North Bergen, New Jersey.

Hadoop Developer.


  • Worked on analysing Hadoop cluster using different Bigdata analytic tools including Hive, Pig.
  • Installed and configured the Hadoop cluster using the Cloudera's CDH distribution and monitored the cluster performance using the Cloudera Manager.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Implemented schedulers on the Job tracker to share resources of the cluster for the jobs given by cluster.
  • Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
  • Handled importing and exporting data into HDFS by developing solutions, analysed the data using, Hive and produce summary results from Hadoop to downstream systems.
  • Used Sqoop to import and export the data from Hadoop Distributed File System (HDFS) to RDBMS.
  • Created Hive tables and loaded data from HDFS to hive tables as per the requirement.
  • Established custom s programs to analyse data and used HQL queries for data cleansing.
  • Created components like Hive UDFs for missing functionality in Hive to analyse and process large volumes of data extracted from the No-SQL database Cassandra.
  • Collecting and aggregating substantial amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on optimization of algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for an HDFS cluster.
  • Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation.

Environment: Cloudera Distribution (CDH), HDFS, Pig, Hive, Map Reduce, Sqoop, Hbase, Impala, Java, SQL, Cassandra.


Java Developer.


  • Involved in the analysis, design, and development, testing phases of the Software Development Life Cycle (SDLC).
  • Analysis, design, and development of Application based on J2EE using Struts and Tiles, Spring 2.0 and Hibernate 3.0.
  • Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.
  • Used XML Web Services for transferring data between different applications.
  • Used Apache CXF web service stack for developing web services and SOAPUI and XML-SPY for testing web services.
  • Used JAXB for binding XML to Java.
  • Used SAX and DOM parsers to parse XML data.
  • Hibernate was used for Object-Relational mapping with Oracle database.
  • Worked with Spring IOC for injecting the beans and reduced the coupling between the classes.
  • Implemented Spring IOC (Inversion of Control)/DI (Dependency Injection) for wiring the object dependencies across the application.
  • Implemented spring transaction management for implementing transactions for the application.
  • Implemented design patterns for Service Locator.
  • Performed unit testing using JUnit3, Easy Mock Testing Framework for performing Unit testing.
  • Worked on PL/SQL stored procedures using PL/SQL Developer.
  • Involved in Fixing the production Defects for the application.
  • Used ANT as build-tool for building J2EE applications.

Environment: HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, PL/SQL, TOAD, Java.


Data Analyst.


  • Involved in design and implementation of server-side programming.
  • Involved in gathering requirements, analysed them and prepared high-level documents.
  • Participated in all client meetings to understand the requirements.
  • Actively involved in designing and data modelling using Rational Rose Tool (UML)
  • Involved in the design of the SPACE database.
  • Designed and development of User Interfaces, Menus using HTML, JSP, JSP Custom Tag, JavaScript.
  • Implemented User Interface using spring tiles framework.
  • Tuxedo server, which provides case details, is fetched by the help of web services technology (i.e. binding, finding a service and use of XML message format etc.)
  • Involved in the integration system with BT's systems like GTC, CSS or through e Link hub and IBMMQ series.
  • Developed, Deployed and tested JSP's, Servlets in WebLogic.
  • Used Eclipse as IDE tool and integrated WebLogic with Eclipse to deploy and develop the applications and JDBC to connect the database.

Environment: Struts Framework, Java 1.3, XML, Data Modelling, JDBC, SQL, Pl/SQL, JMS, Web Services, SOAP, Solaris 9, ANT tool, Toad, Eclipse.

Hire Now