We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

Irving, Tx


  • Over 8+ years of IT experience in analysis, design and development using Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Impala and Spark, Scala), Java and J2EE.
  • Experienced on Big Data in implementing end - to-end Hadoop solutions and analytics using various Hadoop distributions like Cloudera Distribution of Hadoop (CDH), Hortonworks sandbox (HDP) and MapR distribution.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig and Solr, Splunk.
  • Extensive experience in Java and J2EE technologies like Servlets, JSP, JSF, JDBC, JavaScript, ExtJS,hibernate, and Junit testing.
  • Expertise in using J2EE application servers such as IBMWebSphere, JBoss and web servers like ApacheTomcat.
  • Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR), Hadoop, Spark and effective use of map-reduce, SQL and Cassandra to solve big data type problems.
  • Experienced in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
  • Experienced on Hadoop/Hive on AWS, using both EMR and nonEMR-Hadoop in EC2.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala. Solr, Git, Maven, AVRO, JSON and CHEF.
  • Experienced in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
  • Experienced in working with Hadoop architect and big data users to implement new Hadoop eco-system technologies to support multi-tenancy cluster.
  • Experienced in configuring and administering the Hadoop Cluster using major HadoopDistributions like Apache Hadoop and Cloudera.
  • Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, JavaServlets, Struts, and Java database Connectivity (JDBC) technologies.
  • Excellent knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRA and MRv2 (YARN).
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Excellent experience on using Sqoop to import data into HDFS from RDBMS and vice-versa. s
  • Experienced on R and Python (pandas, numpy, scikit-learn) for statistical computing. Also experience with Mllib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
  • Experienced on ImplementingService Oriented Architecture (SOA) using Web Services and JMS (Java Messaging Service).
  • Experienced in MVC (ModelViewController) architecture and various J2EE design patterns like singleton and factory design patterns.
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
  • Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Excellent experienced on NoSOL databases like MongoDB, Cassandra and write Apache Spark streaming API on Big Data distribution in the active duster environment.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.


Operating Systems: Linux, windows, Mac OS, Unix

Infrastructure Design: Microsoft Visio, MYSQL Workbench

Hadoop Eco System: HDFS, MapReduce, Pig, Hive, SparkSQL, HBase, Apache Crunch, Solr, Sqoop, Spark Streaming, Spark, Oozie, Zookeeper, Hue, AVR0,1SON.

Database Servers: Teradata, MYSQL, Oracle, MS SQL Server 2005/2008/2012

ETL Design Tools: Teradata Load Utilities (BTEQ, Fast, Multi load), SSIS, Informatica.

Report DesignTool: SSRS

Application Servers: JBoss 7.1, WebSphere 6.x, WebLogic 11g, JBoss 5.0

Programming Languages: Java,Shell Scripts, Scala, Python, R.

Web Technologies: Servlets, JSP, JSTL, JDBC

IDEs: Eclipse, Netbeans, RAD, Jdeveloper, TOAD, SQL Developer

Frameworks: Spring 3.0/2.5, Struts 2.0, Hibernate 4.x/3.x


Confidential, Irving, TX

Sr. Big Data Architect


  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop,PIG, Hive, MapReduce, Spark and Shellscripts (for scheduling of few jobs) extracted and loaded data into DataLake environment (AmazonS3) by using Sqoop which was accessed by business users and data scientists.
  • Manage and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera & Hortonworks HDP.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Involved in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, SQOOP, flume, Spark, Impala, and Cassandra with Horton work Distribution.
  • Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.
  • Used SparkAPI over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/Hbase into Spark RDD.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms inHadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/Hbase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
  • Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed MapReduce jobs to convert data files into Parquet file format.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Optimized MapReduce code, pig scripts and performance tuning and analysis.

Environment: Hadoop, Java, MapReduce, AWS, HDFS, Redshift, Spark, Hive, Pig, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, DB2, YARN, SQL Server, Informatica, Oracle 12c, SQL, Scala, MySQL, R, Teradata, EC2, Flume, Zookeeper, Teradata.

Confidential, Dallas, TX

Sr. Big Data Architect


  • Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
  • Worked on business problems to develop and articulate solutions using Teradata’s UDAFramework and multi-level data architecture.
  • Worked on analyzing different big data analytic tools including Hive, Impala and Sqoop in importingdata from RDBMS to HDFS.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Designed high level ETLarchitecture for overall data transfer from the OLTP to OLAP.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Created various Documents such as Source-To-Target Data mapping Document, UnitTest, Cases and Data Migration Document.
  • Imported data from structured data source into HDFS using Sqoop incremental imports.
  • Doing data synchronization between EC2 and S3, Hive stand-up, and AWSprofiling.
  • Created Hivetables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
  • Involved in loading data from UNIX tile system to HOPS using Flume and Kettle and HDFS API.
  • Created Hive Generic UDF's to process business logic with Hive QL and build Hive tables using list partitioning and hash partitioning.
  • Developed SQLscripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala and Python.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle, MySQL) for predictive analytics
  • Supported MapReduce Programs those are running on the cluster and also Wrote MapReduce jobs using JavaAPI.
  • Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
  • Wrote Hivequeries for data analysis to meet the business requirements.
  • Developed the technical strategy for Spark integrated for pure streaming and more general data-computation needs.
  • Wrote Pig Latin scripts and also developed UDFs for Pig Data Analysis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Utilized AgileScrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Wrote Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
  • Optimized the mappings using various optimization techniques and also debugged some existing mappings using the Debugger to test and fix the mappings.
  • Updated maps, sessions and workflows as a part of ETL change and also modified existing ETL Code and document the changes.
  • Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.

Environment: Hadoop, Java, MapReduce, AWS, HDFS, Redshift, Spark, Hive, Pig, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, DB2, YARN, SQL Server, Informatica, Oracle 12c, SQL, Scala, MySQL, R, Teradata, EC2, Flume, Zookeeper, Teradata.

Confidential, Chicago, IL

Sr. Big Data Architect/ Developer


  • Maintain Hadoop, Hadoop ecosystems, and database(s) with updates/upgrades, performance tuning and monitoring
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Worked on distributed frameworks such as Apache Spark and Presto in Amazon EMR,Redshiftand interact with data in other AWS data stores such as Amazon 53 and Amazon DynamoDB.
  • Responsible for data gathering from multiple sources like Teradata, Oracle, Sql server etc.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Implemented Kafka High level consumers to get data from Kafkapartitions and move into HDFS
  • Developed multiple MapReducejobs in java for data cleaning and preprocessing.
  • Extracted files from MySQL through Sqoop and placed in HDFS and processed.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Used Oozieand Zookeeper operational services for coordinating cluster and scheduling workflows.
  • Implementing MRprograms to analyze large datasets in warehouse for business intelligence purpose
  • Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pigscripts and PigUDF's.
  • Worked on Implementation of a log producer in Scala that watches for application logs transform incremental log and sends them to a Kafka and Zookeeper based log collection platform
  • Maintaining different cluster security settings and involving in creation and termination of multiple cluster environment.
  • Worked in Agile environment this uses Jira to maintain the story points and Kanbanmodel.
  • Created and Implemented highly scalable and reliable highly scalable and reliable distributed data design using NoSQL/Cassandra technology.
  • Handling structured and unstructured data and applying ETL processes.
  • WroteMapReduce jobs using Java API and Pig Latin.
  • Loaded the data from Teradata to HDFS using TeradataHadoop connectors.
  • Used Amazon EMR to simplifybig data processing and to manageHadoopframework.
  • Wrote Pigscripts to run ETL jobs on the data in HDFS.
  • Used Hive to do analysis on the data and identify different correlations.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.

Environment: Hadoop, HDFS, MapReduce, Unix, REST, Redshift, Python, Pig, Hive, Hbase, Storm, NoSql, Flume, Zookeeper, Kibana, Cloudera, Hortonworks, SQL, Amazon Web Services, SAS, Vertica, Kafka, Cassandra, Informatica, Teradata, Scala, Spark Streaming, Spark.

Confidential, Harrisburg, PA

Sr. Big Data/Hadoop Developer


  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
  • Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
  • Implemented KafkaCustom encoders for custom input format to load data into Kafka Partitions.
  • Migrated complex map reduce programs into SparkRDD transformations, actions.
  • Implemented SparkRDD transformations to map business analysis and apply actions on top of transformations.
  • Automated all the jobs from pulling data from Storage to loading data into MySQL using ShellScripts
  • Automated all the jobs starting from pulling the Data from different Data Sources like MySOL and pushing the result dataset to Hadoop Distributed File System and running MR jobs and PIG/Hive using Kettle and Oozie(Work Flow management).
  • Rendered and delivered reports in desired formats by using reporting tools such as Tableau.
  • Worked on syncing OracleRDBMS to Hadoop DB (HBase) while retaining oracle as the main data store.
  • Developed shell scripts and made the process automatic to drive the process from JSON to BSON.
  • Used Kafka to stream the data with twitter4j from source to Hadoop.
  • Offline Analysis was performed on HDFS and sent the results to MongoDB databases to update the information on the existing table, From Hadoop to MongoDB move was done using Mapreduce, Hive/ Pigscripts by connecting with Mongo-Hadoop connectors.
  • Developed the MapReduce programs to parse the raw data and store the pre Aggregated data in the portioned tables.
  • Created partitioned tables in Hive, mentored analyst and test team for writing Hive Queries.
  • Developed PigLatin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in Installing, Configuring Hadoop EcoSystem, and Cloudera Manager using CDH4 Distribution.
  • Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and runningMR, PIG, and Hive jobs using Kettle and Oozie (WorkFlowmanagement)
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structureddata coming from various sources.
  • Performed ETL process with Python-SQL Server pipelines/framework to perform data analytics and visualization in Python, NumPy, SciPy, Pandas, and MATLAB stack.
  • Worked with moving tables from Teradata to Hadoop using Sqoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required

Environment: Hadoop, HBase, HDFS, Map Reduce, Teradata, SQL, Cloudera, Ganglia, Pig Latin, Sqoop, Hive, pig, MySQL, Oozie, Flume, Informatica, Zookeeper, R, and Python.

Confidential, Buffalo, NY

Sr. Java/Hadoop Developer


  • Involved in all phases of Software Development Life Cycle (SDLC).
  • Coordinated with business customers to gather business requirements. And also interacted with other technical peers to derive Technical requirements.
  • Worked on Installing and configuring MapReduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Wrote HiveUDFs in Java where the functionality is too complex.
  • Used Pig (PigLatin) scripts for ad-hoc data retrieval
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Created Data model for Hive tables.
  • Created Hive tables and wrote Hive queries using HiveQL
  • Worked on AJAX implementation for retrieving the content and display it without reloading the existing page.
  • Used Spring framework for Dependency Injection and integrated with Hibernate.
  • Developed Persistence service layer by Using Hibernate to populate and fetch data from DB.
  • Extensively worked with HibernateQueryLanguage (HQL) to store and retrieve the data from Oracle database.
  • Implemented authentication and authorization using spring security.
  • Involved in the configuration of Hibernate O/R mapping files.
  • Developed shell scripts to run the nightly batch cycle and to set environment variables.
  • Used Maven to build the project, run unit tests and deployed artifacts to Nexus repository.
  • Involved in writing SQLqueries and procedures.
  • Developed RESTful Web Services to retrieve mutual funds data.
  • Used SOAPUI to test the web services.
  • Used JMSAPI for asynchronous communication to put the messages in the Message queue.
  • Used log4j for logging the information.
  • Involved in documenting application test results, fixing bugs and enhancements.
  • Responsible for configuring and deploying application in Development environment and releasing

Environment: Apache Hadoop, CDH 3 (Cloudera Distribution), Java 4, HDFS, Hive, Sqoop, Eclipse, Java 1.5, JSP, HTML, JavaScript, Spring MVC, Hibernate, Jersey, SOAP UI, Oracle 10g, JBoss, JMS 1.1, ActiveMQ, Maven


Java Developer


  • Responsible for requirement gathering and analysis through interaction with end users.
  • Lead the team in designing use-case diagrams, class diagram, interaction using UML model with RationalRose.
  • Designed and developed the application using various design patterns, such as session facade, business delegate and service locator.
  • Worked on Maven build tool.
  • Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework.
  • Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation.
  • Good experience in Mule development.
  • Developed Web applications with Rich Internet applications using Javaapplets, SilverLight, JavaFX.
  • Involved in creating Database SQL and PL/SQL queries and stored Procedures.
  • Implemented Singleton classes for property loading and static data from DB.
  • Debugged and developed applications using Rational Application Developer (RAD).
  • Developed a Web service to communicate with the database using SOAP.
  • Developed DAO (data access objects) using Spring Framework.
  • Deployed the components in to WebSphere Application server
  • Actively involved in backend tuning SQLqueries/DBscript.
  • Worked in writing commands using UNIXShellscripting.
  • Involved in developing other subsystems' server-side components.
  • Used Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
  • Developed UnitTest Cases. Used JUnit for unit testing of the application.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.

Environment: Java 1.6, Servlets, JSP, Struts1.2, IBM Rational Application Developer (RAD) 6, Web sphere 6.0, iText, AJAX, Rational Clear case, Rational Rose, Oracle 9i, log4j.

Hire Now