We provide IT Staff Augmentation Services!

Sr. Big Data/hadoop Architect Resume

Charlotte, NC


  • Having around 10+ years of IT experience in Relational Database design, Core Java development, J2EE application development, SQL and PL/SQL Programming.
  • Excellent experience in setting, configuring and monitoring of Hadoop cluster of Cloudera, Hortonworks distribution, big data, Hadoop Ecosystem.
  • Excellent experience on Installation, Configuration, and Administration of Hadoop cluster of major Hadoop distributions such as Cloudera Enterprise (CDH3 and CDH4) and Hortonworks Data Platform (HDP1 and HDP2)
  • Good experience in system monitoring, development and support related activities for Hadoop and Java/J2EE Technologies.
  • Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle and MS-SQL Server RDBMS.
  • In depth knowledge of Object Oriented programming methodologies (OOPS) and object oriented features like Inheritance, Polymorphism, Exception handling and Templates and development experience with Java technologies.
  • Very good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Extensive experience in using Flume to transfer log data files to Hadoop Distributed File System (HDFS).
  • Very good experience working with Data Analytics using R and SAS.
  • Expertise in using tools like SQOOP, Kafka to ingest data into Hadoop
  • Extensive experience in supporting several database technologies, i.e.: SQL Server, MySQL, and Dynamo DB or any other non-relational databases
  • Experienced working with Hive/HQL to query data from Hive tables in HDFS and successfully loaded files to Hive and HDFS from Oracle and SQL Server using Sqoop.
  • Expertise in usingNoSQL database Hbase, Cassandra for storing large tables by bringing data to Hbase using Pig and Sqoop
  • Extensive experience using MAVEN and ANT as a build tool for the building of deployable artifacts (war & ear) from source code.
  • Experienced in developing Shellscripts and PythonScripts for system management.
  • Excellent working Experience in writing MapReduce Programs in Java and very good knowledge on Apache Cassandra and Pentaho.
  • Hands on experience using Pig, Hive, Map Reduce to analyze large data sets and Scheduled Apache Hadoop jobs using Oozie Workflow manager.
  • Experienced in Launching EC2 instances in Amazon EMR using Console.
  • Expertise in creating several UDF's, UDAF, UDTF using Java and developing Machine learning algorithms using Mahout for clustering and data mining
  • Experienced in Installation, configuration and administration of Hadoop Cluster
  • Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Experienced in supporting Apache and Tomcat applications running on Linux and Unix servers and support of applications running on Linux machines
  • Expertise in SQL programming, running SQL to gather info, create database Tables/Joins
  • Extensive experience in Oracle database design, application development and in-depth knowledge of SQL and PL/SQL and developed Stored Procedures, Functions, Packages and Triggers as backend database support for java applications
  • Experienced in Multiple Relational Databases primarily like Oracle, SQL Server, MySQL and knowledge of non-relational and NOSQL database HBase, MongoDB, Cassandra
  • Experienced in using agile approaches, including Extreme Programming, Test-Driven Development (TDD) and Agile Scrum.
  • Extensive experience in Business Requirements Analysis, Application Design, Development, Data Conversion, Data Migration, Implementation and different aspects of software development like Coding and Testing as both Developer and Analyst.
  • Extensive experience in working IDE tools like Eclipse.
  • Experience in developing Front-End using JavaScript, HTML, XTHML and CSS and very good knowledge on JVM and Performance Measurement Tuning
  • Ability to development and execution of chef, shell and python scripts.
  • Highly motivated self-starter with strong troubleshooting skills, quick learner, good technical skills and an excellent team player


Programming Languages: C, J2EE, JAVA, SQL, R, SAS, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting, Spark, Scala, Python, Chef.

Frameworks: MVC and Struts, Hibernate, Spring

Cloud Computing Services: AWS IAM,EC2, S3, Elastic Beanstalk(EBS), VPC, Instances, Opsworks, Elastic Load balancer (ELB), RDS (Mysql), AMI, SQS, SNS, SWF, Data security, Trouble Shooting, Dynamo DB, API Gateway, Direct Connect, CLoud Front, Cloud Watch, Cloud Trail, Route 53,Sophos,LUKS

Web Tools & Technologies: XML Schema, SAX, DOM, SOAP, WSDL

Big Data Technologies: Hadoop, Map Reduce, Sqoop, Hive, Flume, Oozie, Pig, Scala, ApacheSpark, YARN, ZooKeeper, Impala, Kafka, Mahout, Falcon, Cassandra.

Databases: Oracle MS SQL Server 7.0, MySQL

Operating Systems: UNIX, RH Linux, And Windows NT Ubuntu 12.04, CentOS.

Application Development Tools: SQL Developer, SQL* PLUS, Eclipse Kepler IDE


Confidential, Charlotte, NC

Sr. Big Data/Hadoop Architect

Roles and Responsibilities

  • Installed/Configured/Maintained Horton works Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Involved the design, development of various modules in Hadoop Big Data Platform and processing data using MapReduce, Hive, Pig, Scoop and Oozie.
  • Designed, developed and tested Map Reduce programs on Mobile Offers Redemptions and Send it to the downstream applications like HAVI. Scheduled this MapReduce job through Oozie workflow.
  • Ingested huge amount of XML files into Hadoop by Utilizing DOM Parsers with in Map Reduce. Extracted Daily Sales, Hourly Sales and Product Mix of the items sold in stores and loaded them into Global Data Warehouse.
  • Plan, design and launch solution for building Hadoop cluster on cloud by using EMR and AWS
  • Has implemented Data Lake Project using Oracle FSDF on Hadoop with OFSAA.
  • Developed Python MapReduce programme for log analysis and wrote Hive UDFs for complex functionalities in python.
  • Extensively involved in Installation and configuration of Horton works distribution Hadoop, Name Node, Secondary Name Node, Job Tracker, Task Trackers and Data Nodes.
  • Scheduled Multiple Map Reduce jobs in Oozie. Involved in extracting the promotions data for stores in USA by writing the map reduce jobs and automating it with UNIX shell script.
  • Used Python, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine..
  • Prepared Use Cases, UML diagrams and Vision diagrams.
  • Responsible for working with different teams in building Hadoop Infrastructure
  • Gathered business requirements in meetings for successful implementation and POC and moving it to Production and implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL) and Hive UDF's in python.
  • Implemented different machine learning techniques in Scala using Scala machine learning library.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Successfully loaded files to Hive and HDFS from Oracle, Netezza and SQL Server using SQOOP
  • Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
  • Developed Simple to Quebec and Python Mapreduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
  • Worked on analyzing, writing HadoopMapReducejobs using Java API, Pig and Hive.
  • Developed some machine learning algorithms using Mahout for data mining for the data stored in HDFS
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS)
  • Worked with Oozie Workflow manager to schedule Hadoop jobs and high intensive jobs
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HIVE tables.
  • Creating UDF functions in Pig &Hive and applying partitioning and bucketing techniques in Hive for performance improvement
  • Creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop
  • Involved in Hadoop Name node metadata backups and load balancing as a part of Cluster Maintenance and Monitoring
  • Used File System Check (FSCK) to check the health of files in HDFS and used Sqoop to import data from SQL server to Cassandra
  • Monitored Nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup
  • Used Pig for analysis of large data sets and brought data back to Hbase by Pig and developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using python & java map reduce.
  • Worked with various Hadoop Ecosystem tools like Sqoop, Hive, Pig, Flume, Oozie, Kafka
  • Developed Python Mapper and Reducer scripts and implemented them using Hadoop streaming.
  • Manage day-to-day support to Business user community and Production jobs, Releasing and operationalization.
  • Created schema and data base objects in HIVE and developed Unix Scripts to data loading and automation
  • Hadoop Integration with Informatica and SAS.

Environment: Hadoop, MapReduce, Sqoop, Python, AWS, EMR, EC2, Hive, MapR, Flume, Oozie, Pig, Hbase, Scala, Zookeeper 3.4.3, Talend Open Studio, Talend, OFSAA, Oracle 12c, Azure, Apache Cassandra, SQL Server 2012, MySQL, Java, SQL, PL/SQL, UNIX shell script, OLAP, Eclipse Kepler IDE, Microsoft Office 2010, MS Outlook 2010.

Confidential, Chicago, IL

Sr. Big Data/Hadoop Architect

Roles and Responsibilities

  • Designed and developed multiple MapReduce jobs in Java for complex analysis. Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Configured Flume to transport web server logs into HDFS. Also used Kite logging module to upload webserver logs into HDFS.
  • Developed UDF functions for Hive and wrote complex queries in Hive for data analysis
  • Performed Installation of Hadoop in fully and Pseudo Distributed Mode for POC in early stages of the project.
  • Hands on coding - Write and test the code for the Ingest automation process - Full and Incremental Loads. Design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python
  • Analyze, develop, integrate, and then direct the operationalization of new data sources.
  • Generating Scala and java classes from the respective APIs so that they can be incorporated in the overall application.
  • Created Elastic Map Reduce (EMR) clusters and Configured the Data pipeline with EMR clusters for scheduling the task runner and provisioning of Ec2 Instances on both Windows and Linux.
  • Applied Spark transformation - spark SQL on the tables according to business rules and Created and scheduled Spark scripts in Scala and python as per business rule.
  • Processed large data sets utilizing Hadoop cluster. The data that are stored on HDFS were preprocessed/validated-using PIG then the processed data is stored into Hive warehouse, which enabled business analysts to get the required data from Hive.
  • Used Oozie to automate the data loading into Hadoop Distributed File System. Designed & implemented Java MapReduce programs to support distributed data processing.
  • Developed Hive queries to join click stream data with the relational data for determining the interaction of search guests on the website
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Implementation of MapR (Zookeeper, CLDB, YARN, HDFS, Spark, Impala, MCS, Oozie, Hive, HBase)
  • Manipulated and Cleansed data using Python.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Used Spark with Yarn and got performance results compared with MapReduce
  • Involved in implementation of Hadoop Cluster and Hive for Development and Test Environment
  • Developed MapReduce programs in Java to search production logs and web analytics logs for use cases like application issues, measure page download performance respectively
  • Demonstrated experience in design and implementation of predictive models using statistical modeling techniques such as hypothesis testing, confidence intervals, variance explained in models etc.
  • Migrated traditional MR jobs to Spark MR Jobs and worked on Spark SQL and Spark Streaming
  • Done data Cleaning, Validating and Exploration and handling Missing values using various SAS Techniques.
  • Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Used Zookeeper along with Hbase
  • Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data
  • Used Hive/HQL or Hive queries to provide Adhoc-reports for data in Hive tables in HDFS
  • Involved in admin related issues of Hbase and other NoSQL databases
  • Integrated Hadoop with Tableau and SAS analytics to provide end users analytical reports
  • Handling the documentation of data transfer to HDFS system from various sources. (SQOOP, Flume and FALCON)
  • Cluster co-ordination services through Zookeeper.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Created custom user defined functions in Python language for Pig.
  • Worked with different team in ETL, Data Integration and Migration to Hadoop
  • Evaluation of ETL (Talend) and OLAP tools and recommend the most suitable solutions based on business needs.
  • Extracted data from one or more source files and Databases using SAS macros and SAS SQL.
  • Done Source model generation, Import model generation, creation of mappings, process creation and executed the batch in OFSAA
  • Implement POC with Hadoop. Extract data with Spark into HDFS.
  • Used different file formats like Text files, Sequence Files, Avro.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Providing access to the developers to access HDFS from sources like Informatica, MDM and SAS

Environment: Hadoop, Hive, Pig, Kafka, Scala, Spark, Python, Cassandra, SAS, HBase, MongoDB, Scoop, Flume, Falcon, Storm, Oracle 11g, Java, SQL, HBase, Oozie, EMR, OFSAA, OLAP, YARN, Zookeeper, Python, Eclipse Kepler IDE, Microsoft Office 2007, UNIX, MS Outlook 2007

Confidential, OH

Sr. Bigdata/Hadoop Developer

Roles and Responsibilities:

  • Analyzed Business Requirements and Identified mapping documents required for system and functional testing efforts for all test scenarios.
  • Implemented of Core concepts of Java, J2EE Technologies: JSP, Servlets, JSF, JSTL, EJB transaction implementation (CMP, BMP, and Message-Driven Beans), JMS, Struts, Spring, Swing, Hibernate, Java Beans, JDBC, XML, Web Services, JNDI, Multi Threading, Drools, etc.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Installed/Configured/Maintained Apache Hadoop clusters for Analytics, application development and Hadoop tools like Hive, HSQL Pig, HBase, OLAP, Zookeeper, Avro, parquet and Sqoop on Linux ARCH.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Installed and configured Hive and also implemented various business requirements by writing HIVE UDFs.
  • Configuration of application using spring, Struts, Hibernate, DAO's, Actions Classes, Java Server Pages.
  • Configuring Hibernate Struts and Tiles related XML files.
  • Developed the application using Struts Framework that uses Model View Controller (MVC) architecture with JSP as the view.
  • Developed presentation layer using JSF, JSP, HTML and CSS, JQuery.
  • Extensively used Spring IOC for Dependency Injection and worked on Custom MVC Frameworks loosely based on Struts.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Extensively worked on user interface for few modules using HTML, JSP's, JavaScript, and Python.
  • Generated Business Logic using servlets, Session beans and deployed them on Web logic server and created complex SQL queries and stored procedures.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Developed the XML schema and Web services for the data support and structures.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Used different file formats like Text files, Sequence Files, Avro Used Zookeeper to manage coordination among the clusters
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Used IMPALA to pull the data from Hive tables and developed the Frontend application with Angular JS.
  • Used Base SAS, SAS/Macro, and SAS/SQL to develop codes and generated various analytical reports using Excel.
  • Involved in writing Python Scripts and Managed application deployment using Python and configuring Sqoop and Exporting/Importing data into HDFS.

Environment: J2EE, JDK, JSP, JSF, Scala, Python, Spark, MVC and Struts, OLAP, Eclipse IDE, Hibernate, Hadoop, MapReduce, HBase, Hive, Pig, Sqoop, SAS, Impala, Zookeeper, SQL Developer, Oracle 10g, Angular JS, JavaScript, HTML5, CSS, SQL.


Sr. Java/Hadoop Developer

Roles and Responsibilities:

  • Responsible for writing functional and technical documents for the modules developed.
  • Extensively used J2EE design Patterns and used Agile/Scrum methodology to develop and maintain the project.
  • Developed GUI using JSP, Struts, HTML3, CSS3, XHTML, Swing and JavaScript to simplify the complexities of the application.
  • Developed and maintained web services using XMPP and SIP protocols.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
  • Developed business logic using Spring MVC and developed DAO layer using Hibernate, JPA, and Spring JDBC.
  • Used Oracle 10g as the database and used Oracle SQL developer to access the database.
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Used JQuery library, Node.JS and Angular.JS for creation of powerful dynamic WebPages and web applications by using its advanced and cross browser functionality.
  • Used internal tool to design dataflow with Cassandra/MONGODB NOSQL databases
  • Developed the entire front end screens using Ajax, JSP, JSP Tag Libraries, CSS, Html and JavaScript.
  • Extensively worked on Spring and Hibernate Frameworks and Implemented complex MAPREDUCE algorithms using JAVA languages
  • Setting up the build and deployment automation for Java base project by using JENKINS and Maven.
  • Implemented Struts tab libraries for HTML, beans, and tiles for developing User Interfaces.
  • Developed the transaction forms using JSPs, Servlets, JSTL's and RestfulServices
  • Extensively used Soap UI for Unit Testing and involved in Performance Tuning of the application.
  • Used Log4J for extensible logging, debugging and error tracing. And Used Oracle Service Bus for creating the proxy WSDL and then provide that to consumers
  • Clear Case, TFS and GIT were used for version control systems and source code management and Used JMS with Web Logic Application server.
  • Used UNIX scripts for creating a batch processing scheduler for JMS Queue and Need to discuss with the client and the project manager regarding the new developments and the errors.
  • Used Test driven approach (TDD) for developing the application and documented all the modules and deployed on server in time.
  • Involved in Production Support and Maintenance for Application developed in the RedHat Linux Environment.

Environment: Java, Spring, Hibernate, Python, XML, XSD, XSLT, WSDL, Web services, XMPP, SIP, JMS, SOAP UI, Eclipse, IBM-UDB, Web logic, Oracle 10g, Node.js, GIT, Jenkins, Oracle SQL developer, MongoDB, Cassandra, MapReduce, Pig.


Java/J2EE developer

Roles and Responsibilities:

  • Reviewed requirements with the support group and developed an initial prototype.
  • Involved in Installation and Configuration of Tomcat, Spring Source Tool Suit, Eclipse, unit testing.
  • Involved in the analysis, design and development of the application components using JSP, Servlets components using J2EE design pattern.
  • Designed the application using the Struts MVC architecture.
  • Designing, coding and configuring server side J2EE components like JSP, Servlets, Java Beans, XML.
  • Used Maven tool to build, config, and packaged, deploy an application project and integrated with Jenkins.
  • Architected an enterprise service bus using Mule, Java (EJB), Hibernate, and Spring to tie back-end business logic/systems with web properties via a corresponding RESTful API.
  • Developed web tire using Servlets, JSP, Struts, Tiles, Java Script, HTML and XML.
  • Used Front Controller design pattern for Domain blocking module. Also extensively used Singleton, DAO design patterns for enhancements to other modules.
  • Developed high traffic web applications using HTML, CSS, JavaScript, jQuery, Bootstrap, AngularJS, and Node.js.
  • Designed and developed Application based on Struts Framework using MVC design pattern.
  • Developed services which involved both producing and consuming web services (WSDL, SOAP and JAX-WS). Also published the WSDL to UDDI registry using the SOA architecture.
  • Development of PL/SQL Stored Procedures to be used by the JavaDAO layer and development of UI Mock Prototype using HTML and JavaScript for Domain Blocking module.
  • Used CVS as version control. Developed (Java Server Pages) JSP's and generated HTML Files.
  • Used SAX/DOM XML Parser for parsing the XML file and Communicated between different applications using JMS.
  • Extensively worked on PL/SQL, SQL and developed different modules using J2EE (Servlets, JSP, JDBC, JNDI)
  • Integrated the Application with Database using JDBC and used JNDI for registering and locating Java objects.
  • Developed and deployed EJB like Entity Beans and Session Beans and performed functional, integration and validation testing.

Environment: Java, JSP, Struts, Tiles Servlet, GIT, JavaScript, HTML, Struts, Eclipse, XML and XSL, node.js, Eclipse IDE, Oracle Developer and CVS, Spring, Hibernate, EJB, Jenkins, SOAP, Restfull.

Hire Now