We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

Houston, TX


  • Over 9 years of experience in Development, Design, Integration, and Presentation with Java along with 4 years of Big Data /Hadoop experience in hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, Hbase, SPARK, Kafka, Python and AWS .
  • Experienced in developing web based GUIs using JavaScript, JSP, HTML, JQuery, XMLand CSS. servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.
  • Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
  • Technologies extensively worked on during my tenure in Software Development are Struts, Spring, CXF Rest API, Webservices, SOAP, XML, JMS, JSP, JNDI , Apache, Tomcat, JDBC and various Databases like Oracle, and Microsoft SQL server.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Expertise in architecting Big data solutions using Data ingestion, Data Storage
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3 .
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Experienced with Akka building high performance and reliable distributed applications in Java and Scala.
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good experience in Shell programming.
  • Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experienced to develop enterprise applications with J2EE/MVC architecture with application servers and Web.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
  • Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in application development using Java, J2EE, JDBC, spring, Junit.
  • Experienced in using various Hadoop infrastructures such as Map Reduce , Hive , Sqoop , and Oozie .
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Strong Experience in working with Databases like Oracle 12C/11g/10g/9i, DB2, SQL Server and MySQL and proficiency in writing complex SQL queries.
  • Experienced in using database tools like SQL Navigator, TOAD.


Java/J2EE Technologies: JSP, Servlets, JQuery, JDBC, Java Script.

Hadoop/Big Data:: HDFS, Hive, Pig, HBase, Map Reduce, Zookeeper, Scala, Akka, Kafka, Storm, Mongo DB, Sqoop, Oozie, FlumeLanguages: Java, J2EE, HQL, R, Python, XPath, Spark, PL/SQL, Pig Latin.

Operating Systems: Linux, Windows, UNIX, Ubuntu, Centos, Sun Solaris.

No SQL Databases: Mongo DB, Dynamo DB, CassandraWeb Technologies: HTML, XML, DHTML, XHTML, CSS, XSLT.

Web/Application servers:: Apache Tomcat6.0/7.0/8.0, JBoss.

Frameworks: MVC, Struts, Spring, Hibernate.

Databases: Microsoft Access, MS SQL, Oracle 12c/11g/10g/9i.



Confidential, Houston, TX

Sr. Big Data Architect


  • Involved in Design and Architecting of Big Data solutions using Hadoop Eco System.
  • Unified data lake architecture integrating various data sources on Hadoop architecture
  • EDW assessment using tools Attunity, Cloudera and Gluent
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Writing Scala code to run SPARK jobs in Hadoop HDFS cluster .
  • Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need.
  • Experience on BI reporting with at Scale OLAP for Big Data.
  • Implementation of Big Data ecosystem (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase .
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Utilized NOSQL database HBase for loading HIVE tables into HBase tables through Hive-HBase integration which was consumed by Data scientist team.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Identify query duplication, complexity and dependency to minimize migration efforts
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Involved in working of big data analysis using Pig and User defined functions (UDF).
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR.
  • Experience in integrating oozie logs to kibana dashboard.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (cassandra).
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Used Spark SQL to process the huge amount of structured data.
  • Assigned name to each of the columns using case class option in Scala.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Experience on BI reporting with at Scale OLAP for Big Data.
  • Responsible for importing log files from various sources into HDFS using Flume
  • Expert in performing business analytical scripts using Hive SQL.

Environment: Big Data, Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.

Confidential, NYC, NY

Big Data Engineer


  • Worked with Hadoop Ecosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Utilize AWS services with focus on big data Architect /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Experience in data cleansing and data mining.
  • Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
  • All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats.
  • Ingest data into Hadoop / Hive/HDFS from different data sources.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2 .
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Performed File system management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed customized classes for serialization and Deserialization in Hadoop.

Environment: : Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential, Bronx, NY

JAVA/ Big Data Developer


  • Worked on Hadoop, Hive, Oozie, and MySQL customization for batch data platform setup.
  • Involved in the coding and integration of several business critical modules of CARE application using Java, spring, Hibernate and REST web services on Web Sphere application server.
  • Developed web components using JSP, Servlets, and JDBC.
  • Designed and developed Enterprise Eligibility business objects and domain objects with Object Relational Mapping framework such as Hibernate.
  • Developed the Web Based Rich Internet Application (RIA) using JAVA/J2EE (spring framework).
  • Participated in JAD meetings to gather the requirements and understand the End Users System.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Generated XML Schemas and used XML Beans to parse XML files.
  • Modified the existing JSP pages using JSTL.
  • Developed web pages using JSPs and JSTL to help end user make online submission of rebates. Also used XML Beans for data mapping of XML into Java Objects.
  • Experience working with big data and real time/near real time analytics and big data platforms like Hadoop, Spark using programming languages like Scala and Java.
  • Developed and Implemented new UI' s using Angular JS and HTML.
  • Involved in Big Data Project Implementation and Support.
  • Developed Spring Configuration for dependency injection by using Spring IOC, Spring Controllers .
  • Implementing Spring MVC and IOC methodologies.
  • Used the JNDI for Naming and directory services.
  • Involved in the coding and integration of several business critical modules of application using Java, Spring, Hibernate and REST web services on WebSphere application server.
  • Deliver Big Data Products including re-platforming Legacy Global Risk Management System with Big Data Technologies such as Hadoop, Hive and HBase.
  • Developed Restful web services using JAX-RS and used DELETE, PUT, POST, GET HTTP methods in spring 3.0 and OSGI integrated environment.
  • Used the light weight container of the Spring Framework to provide architectural flexibility for inversion of controller (IOC).
  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report on IBM WebSphere MQ messaging system.
  • Used Spring JDBC Dao as a data access technology to interact with the database.
  • Developed Unit and E2E test cases using Node JS.
  • Developed presentation layer using Java Server Faces (JSF) MVC framework.
  • Worked with NoSQL Mongo DB and heavily worked on Hive, Hbase and HDFS.

Environment: JSP 2.1, Hadoop 1x, Hive, Pig, HBASE, JSTL 1.2, Java, J2EE, Java SE 6, UML, Servlets 2.5, Spring MVC, Hibernate, JSON, Unix, JUnit, DB2, Oracle, Restful Web services, jQuery, AJAX, Angular. Js, JAXB, IRAD Web sphere Integration Developer, Web Sphere 7.0.

Confidential, Louisville, KY

Sr. Java/ J2EE Developer


  • Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
  • Expertise in designing and creating RESTful API's using Apache Solr and Spring WS Developed and modified database objects as per the requirements.
  • Design and developed Web Services (SOAP) client using AXIS to send service requests to Webservices. Invoked Web Services from the application to get data.
  • Developed screens using jQuery, JSP, JavaScript, AJAX and ExtJS
  • Implemented Log4j for the project to compile and package the application, used ANT and MAVEN to automate build and deployment scripts.
  • Created POJO classes, java beans, EJBBeans and wrote JUnit test cases to test code as per the acceptance criteria throughout the application during development and testing Phase.
  • Used Git as version control tools to maintain the code repository.
  • As part of AngularJS development have used data-binding and developed controllers, directives, filters and integrated with the backend-services.
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
  • Worked on importing the unstructured data into the HDFS using Flume.
  • Wrote complex Hive queries and UDFs.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Developing Intranet Web Application using J2EE architecture, using JSP to design the user interfaces, and JSP tag libraries to define custom tags and JDBC for database connectivity.
  • Extensively worked on OIM Connectors like Active Directory, ED, IBM RACF, RSA, OID, OIF, Database User Management, CA Top Secret Advanced and Flat File.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Involved in developing Shell scripts to easy execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Managed Hadoop jobs using Oozie workflow scheduler system for Map Reduce, Hive, Pig and Sqoop actions.
  • Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased Reliability and Ease of Scalability over traditional MSMQ.
  • Used Flume to collect the log data from different resources and transfer the data type to hive tables using different SerDe to store in JSON, XML and Sequence file formats.

Environment: Java, JSF, Spring, Hibernate, Linux Shell Script, JaxWS, SOAP, WSDL, CSS3, html3, JBOSS, JSF, Rally, Hudson xml, html, Clear Case, Clear Quest, My Eclipse, ANT, Oracle, Linux, Oracle 10g database.


Java Developer


  • Developed user stories using Core Java and Spring 3.1 and consumed rest web services exposed from the profit center.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Used JSP, HTML, Java Script, Angular JS and CSS3 for content layout and presentation.
  • Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
  • Developed JavaScript behavior code for user interaction.
  • Created database program in SQL server to manipulate data accumulated by internet transactions.
  • Wrote Servlets class to generate dynamic HTML pages.
  • Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
  • Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC .
  • Created rapid prototypes of interfaces to be used as blueprints for technical development.
  • Used UML diagrams Use Cases, Object, Class, State, Sequence and Collaboration to design the application using Object Oriented analysis and design.
  • Worked extensively with JSP's and Servlets to accommodate all presentation customizations on the front end.
  • Developed JSP's for the presentation layer.
  • Created DML statements to insert/update the data in database and also created DDL statements to create/drop tables to/from oracle database.
  • Configured Hibernate for storing objects in the database, retrieving objects, querying objects and persisting relationships between objects.
  • Used Spring Core and Spring-web framework. Created a lot of classes for backend.
  • Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
  • Debugged the application using Firebug to traverse the documents.
  • Involved in developing web pages using HTML and JSP .
  • Provided Technical support for production environments resolving the issues, analysing the defects, providing and implementing the solution defects.
  • Used Oracle 10g as the backend database using UNIX OS.
  • Did core Java coding using JDK 1.3, Eclipse Integrated Development Environment (IDE), clear case, and ANT.

Environment: Java, XML, HTML, JavaScript, JDBC, CSS, SQL, PL/SQL, XML, Web MVC, Eclipse, Ajax, JQuery, spring with Hibernate, Active MQ, Ant, My SQL.

Hire Now