We provide IT Staff Augmentation Services!

Big Data Architect Resume

Alexandria, VA


  • Over 12 years of experience in Development, Analysis, Design, Integration, and Presentation with ETL tools along with 4 years of Big Data /Hadoop experience in hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, Hbase, SPARK, Kafka, Python and AWS.
  • Worked in INDIA, SWEDEN, and DENMARK and currently working in USA on various technical, functional and business assignments.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Hands on experience on core Java and solid experience in debugging Map Reduce applications.
  • Strong experience in building and debugging the Spark & Scala applications using functional programming concepts.
  • Importing data using Sqoop into HDFS from various Relational Database Systems.
  • Worked on standards and proof of concept in support of CDH4 and CDH5 implementation using AWS cloud infrastructure.
  • Hands on experience in implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
  • Good knowledge in Architecting, Designing, re - Engineering and Performance Optimization.
  • Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions, i.e. Cloudera, Horton works and NoSQL platforms (Hbase & Cassandra).
  • Experienced with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.
  • Very Good Knowledge in Logical and Physical Data modeling, creating new data models, data flows and data dictionaries. Hands-on experience in using relational databases like Oracle and MS-SQL Server.
  • Experienced with version control systems like SVN and Serena.
  • Extensively worked on ETL processing tools like Pentaho and SSIS.
  • Knowledge on different reporting tools (SSRS, Cognos).
  • Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines.
  • Quick Learner, with high degree of passion and commitment in work.
  • Has domain expertise in Health Insurance, Telecom, and Finance & Airline domain.
  • Proficiency with mentoring and on-boarding new engineers who are not proficient in Hadoop and getting them up to speed quickly.
  • Knowledge on MPP databases like Impala & SQL Server APS polybase.


Platforms/ cloud services: Windows, Ubuntu, Linux, CentOS 7,EMR,S3

Big Data Technologies: Apache Hadoop, HDP 2.2.4, CDH5, Hive, Sqoop, Pig, Flume, Zookeeper, Hbase, Solr, Spark, Storm, Kafka

Machine Learning Platforms: Apache Mahout

Web Servers: Apache 1.3/2.x, IIS 5.0/6.0

Application Servers: BEA Weblogic, Apache Tomcat 5.x/7.x

Languages: Shell scripting, HTML, XML, SQL,C#

Packages: J2EE, JDK

Databases: Oracle, SQL Server, MySQL

Tools: Apache Solr,Jenkins, Matlab, JIRA,Visual Studio 2010/2008

Version Control: SVN,GITHUB,CVS

BI Tools: SSIS,SSRS,SSAS & Cognos Report studio 10


Confidential, Alexandria, VA

Big Data Architect


  • Defined the logic for hitting CMS restapi and get the documents for a patent application and loaded in to HDFS.
  • Design and implement ETL pipelines in spark using Python.
  • Responsible for developing NIFI flows and creating schedules for data extraction and running pipelines by identifying the dependencies.
  • Designed the ETL flows by combining data from different sources and exported to elastic search.
  • Responsible for developing column mapping, transformations and data profiling tasks.

Confidential, Waltham, MA

Big Data Sr. Consultant


  • Defined the logic for identifying visitor stitching use case scenarios and identify Visit and Sub Visit.
  • Design and implement ETL solution in spark for visitor stitching.
  • Involved in processing data from AWS S3 buckets and loaded into EDW tables.
  • Worked on implementing optimizations techniques, persistence/caching, compression concepts which are gained through development experience.
  • Responsible for developing column mapping, transformations and data profiling tasks.
  • Responsible for scheduling ETL tasks in Airflow, which is an ETL task scheduler and dependency manager.
  • Worked extensively on debugging and making changes ETL spark solution.
  • Configured git repositories and used different sbt plugins to manage the build dependencies.
  • Developed a testing framework in Scala to test different use cases in visitor stitching.
  • Deploy the developed jar files in to Qubole and make sure spark environment is correctly spinning up with latest changes.
  • Load the data from Parquet files in to APS tables by using scheduled jobs.

Confidential, Jersey City, NJ

Big Data Sr. Consultant


  • To react to fraud in a timely manner, developed a look up the history of any user in few milliseconds with options of using HBase NoSQL databases and caches for quick retrieval of information.
  • Worked on various machine-learning models and applied on data to find correlation between different entities.
  • Hands on experience in various models like Linear Regression, Decision Trees, Naïve Bayes, Random Forests and K-means Clustering and worked on MLLib, GraphX libraries and spark streaming.
  • Involved in loading data into HBase using HBase Shell, HBase Client API.
  • Very good understanding of Spark core concepts and internals like partitions, memory management, functions of executors and drivers, shuffling, optimizations techniques, persistence/caching, compression concepts which are gained through development experience.
  • Responsible for developing data pipeline using flume and Kafka to extract the data and store in HDFS.
  • Used Maven extensively for building jar files of Java MapReduce programs and deployed to Cluster.
  • Created Multimode projects using SBT, developed the build script, and used various sbt plugins to manage the build dependencies.


Senior Big Data Developer


  • Responsible for ingesting data in to HDFS using Sqoop from relational databases Sql Server and DB2.
  • Responsible for reconciliation between current records in HIVE and new change records.
  • Responsible for performing extensive data validation using Hive.
  • Created Sqoop jobs, PIG and Hive scripts for data ingestion from relational databases to compare with historical data.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, Mongo DB, NoSQL and a variety of portfolios.
  • Hands on experience in adjusting the memory settings for Map or Reduce and any other performance optimizations needed to make the job execute in the agreed SLA time.
  • Responsible for debugging failed jobs in production.


ETL Architect


  • Utilized SQL*Loader to load flat files database tables.
  • Responsible for SQL tuning and optimization using Analyze, Explain Plan, TKPROF utility and optimizer hints.
  • Responsible for performing code reviews.
  • Designed and created different ETL packages using SSIS from Oracle source to SQL Server as destination
  • Installed upgraded and migrated databases from Oracle database to MS SQL Server 2008.
  • Transformed data from various data sources using OLE DB connection by creating various SSIS packages.


Senior Database Developer and Analyst


  • Responsible for analyzing business functionality for the back end processes
  • SQL Server 2008 RDBMS database development using T-SQL programming, queries, stored procedures, views.
  • Performed tuning and optimization on SQL queries using Explain Plan and optimizer hints.
  • Created SQL reports, data extraction and data loading scripts for different databases and schemas
  • Developed SSIS packages using different type’s task and with Error Handling.
  • Developed reports on SSRS on SQL Server (2005).
  • Used SSIS to create ETL packages to Validate, Extract, Transform and Load data to Data Warehouse and Data Mart Databases.


ETL Developer


  • Estimation, Analyzing FRD’s, Preparing Design Documents, writing Unit Test cases, Coding, Unit testing & guiding team members
  • Responsible for package deployment and development
  • Responsible for troubleshooting production problems Writing queries to front end designers.
  • Used DTS for data population from different sources and create DTS packages for data conversions and load data from flat flies or excel files.
  • Fine tune complex SQL queries and Stored Procedures to increase performance.
  • Worked as a developer in creating complex Stored Procedures, DTS packages, triggers, cursors, tables, and views and other SQL joins and statements for applications.

Hire Now