We provide IT Staff Augmentation Services!

Big Data Architect Resume

3.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • Over 17+ years of experience in IT in the areas of Big Data Architecture, Hadoop, Cloud Computing, Parallel and High Performance Computing, Mobile technologies, Java/J2EE, JavaScript frameworks, and several other open source frameworks.
  • 9 years of experience in Big Data, Spark, Kafka, Hadoop, Cloud Computing, NoSQL, NiFi, and another 2.5 years of experience in Parallel Computing (Big Data) framework MPI (Similar to Hadoop), and published several papers in conferences on my thesis work “Parallel Framework for Large Scale Unstructured Mesh Generation”.
  • Extensive experience as Big Data Architect and Team Lead. Successfully lead several small and large development teams in onshore and offshore.
  • Expertise in building real time analytics applications using Kafka, Spark Structured Streaming, and Spark Core.
  • Expertise in Big Data Hadoop ecosystem like Hive, HiveQL, Pig, Map Reduce, HDFS, YARN, Sqoop, etc.
  • Have Apache Spark Certification and good experience in NoSQL Databases like HBase and DynamoDB.
  • Experience in Hadoop Administration, setting up Hadoop Cluster, Cluster Planning, Job Scheduling, Securing Cluster, Backup and Recovery, Hive, and HBase administration.
  • Expertise in designing and developing solutions using AWS Cloud Computing products such as Amazon S3, RedShift, EC2, SQS, Cloud Watch, EMR, RDS, SES, etc.
  • Performed a detailed evaluation of Big Data technologies as Hortonworks, Cloudera, and Amazon Elastic Map Reduce (EMR)
  • Good understanding of Data Warehousing environment, ETL tool like Talend, Informatica and BI Reporting tools like Talend, QlikView, etc.
  • Extensive experience in using Parallel Computing (Big Data) framework Message Passing Interface (MPI).
  • Developed a Parallel Computing (Big Data) Framework for generating large scale mesh on simulated human skull, brain, lungs, etc. objects to research the impact on these objects during accidents.
  • Advised organizations about big data, a big data strategy, the implementation of big data, which technologies best fit the needs of the organization and implemented the selected big data solution.
  • Worked with data scientists and helped them extract actionable insights fromhuge amount of data on theHadoopcluster.
  • Mentor, consult, collaborate, review and support Engineering teams, data scientists and business users
  • Expertise in using Node JS, Ext JS, Ajax, JavaScript, Spring, Hibernate, Seam, JSF, Rich Faces, and Struts frameworks.
  • Expertise in designing and developing applications using Java based technologies, such as Java, J2EE, EJB 3/2.x, Servlets, JSP, JDBC, JSTL, SOAP and REST based Web Services using Jersey and Apache Axis.
  • Extensive experience in managing offshore teams in different geographic locations and time zones.
  • An excellent leader, problem solver, and quick learner with great analytical Skills. Ability to perform at high level to meet deadlines, adaptable to ever changing priorities.
  • Experience in various industry verticals - Telecommunications, Networking, Financial Services, Engineering, Healthcare, Retail and Technology.
  • Extensive experience in Waterfall and Agile development methodologies. Ability to quickly and effectively, develop key relationships with a range of project participants including executives, directors, managers, vendors, client stakeholders, and developers. Strong conceptual and analytical skills. Excellent verbal, written communication and presentation skills.

TECHNICAL SKILLS

Spark Technologies: Spark Core, Spark Structured Streaming, Spark SQL

Big Data Technologies: Apache Kafka, Spark, NiFi, Hive, MemSQL, Flume, Pig, Sqoop

Hadoop Technologies: Hadoop, HDFS, MapReduce, MRV2, YARN

Hadoop Distributions: Cloudera, Hortonworks, Amazon Web Services (AWS) EMR

NoSQL Databases: Cassandra, HBase, DynamoDB, MongoDB

AWS Core Services: EMR, S3, RedShift, EC2, SQS, EBS, RDS

Parallel Computing: Message Passing Interface (MPI), PVM

Programming Languages: Scala, Java, Python, JavaScript, C, C++, SQL, PL/SQL

JavaScript Frameworks: Node.js, Ext JS, Angular JS, Express JS, Ajax, Ajax4jsf

Java Technologies: JDK 8, JSF, EJB 3.0/2.x, Servlets, JSP, JMS, JDBC

Open Source Frameworks: Seam, Spring, Hibernate, iBATIS, Struts 1.x/2, Rich Faces, JPA

Mobile Technologies: BREW, J2ME, Android

Web Services: REST, Jersey, Axis 1.x, SOAP, WSDL, UDDI

Databases: Oracle, MySQL, SQL Server, DB2, Informix

Servers: JBoss 4.x, WebSphere, Weblogic, Tomcat, Apache

Version Control: GitHub, GitLab, Subversion, TFS, CVS, Clear Case

Operating Systems: Linux, UNIX, Sun Solaris, Windows

Others: Oracle Golden Gate (OGG), Visio, Gliffy, Marketo

PROFESSIONAL EXPERIENCE

Confidential, Chicago, IL

Big Data Architect

Responsibilities:

  • As a hands-on big data architect, created reusable Spark components for processing clinical trial data and successfully guided developers to deliver multiple solutions to Pharmaceutical clients like Johnson & Johnson (J&J), GlaxoSmithKline (GSK), and Merck.
  • Successfully guided both onshore and offshore teams and delivered multiple big data projects in tight timelines.
  • Developed reusable Spark components like Source Data Acquisition, File Loader, and File Generation to use in multiple clinical trial study data processing.
  • Worked with eCOA clinical trial business team to understand the business requirements of multiple clinical trial studies for various clients
  • Led a team of 8 developers and successfully delivered various studies that were set up in the eCOA clinical studies portal.
  • Designed GSK, Merck, JNJ, and eCOA projects with Spark, Scala, Spark SQL and Hive technologies.
  • Created architecture diagrams for all the big data projects in Hadoop EDGE platform.
  • Provided security for PII and PHI data by creating encryption zones in CDP with Apache Ranger.
  • Conducted detailed code reviews and provided feedback for big data projects in the Hadoop EDGE platform.
  • Involved in customer audit meetings with clients and top management team.

Environment: Apache Spark, Scala, Spark SQL, Kafka, Hadoop, HDFS, Hive, Cloudera, Big Data, Java, SQL, Oracle, Postgres, Toad, Data Warehouse, GitLab, Lucid Chart

Confidential, Chicago, IL

Principal Big Data Architect

Responsibilities:

  • As a principal big data architect, created a Hadoop cluster with Kafka from scratch and successfully guided multiple teams to adapt big data technologies.
  • Analyzed existing Confidential platforms and gave detail presentation to executive team (CTO, CIO, SVP) to adapt big data technologies in Confidential .
  • Guided big data administrators and infrastructure team to set up on-prem Hadoop cluster with Kafka from scratch for real time data analytics in Dev, QA, and Prod environments.
  • Successfully migrated existing heavy load (60M transactions per day) RSC Link application (developed in C) to new debit and credit card transactions processing system (CTPS) using Kafka, Spark Structured Steaming, and Oracle Golden Gate (CDC tool).
  • Designed Debit and Credit card transactions system in such a way that message ordering is preserved from source, Kafka, and all the way to downstream systems and at the same time achieved high performance through parallel processing using Spark Structured Streaming.
  • Guided developers to implement complex fee calculation logic for each card transaction using Spark SQL after converting binary data to readable format in processing layer.
  • Designed the solution in such a way that new applications can reuse template based code. All the transformation logic is implemented in Spark SQL to leverage SQL resources in project.
  • Successfully created Oracle Golden Gate Big Data (OGGBD) replicat processes to push real time card transactions data from Tandem Systems to Kafka.
  • Successfully migrated couple of batch applications from Netezza (EDW) to real time applications using Golden Gate, Kafka, Spark, HDFS, Hive and Tableau.

Environment: Apache Kafka, Spark Structured Streaming, Spark, Spark SQL, Scala, Hadoop, HDFS, Cloudera, CDH 6.2.1, Netezza, SQL, Data Warehouse, Tableau, Oracle Golden Gate Big Data, Visio, GitLab.

Confidential, Chicago, IL

Senior Big Data Solution Architect

Responsibilities:

  • As hands on Big Data Architect, designed, developed and provided architectural guidance for several projects in Confidential Keystone Analytics Platform.
  • Successfully guided both onshore and offshore teams and delivered multiple big data projects in tight timelines.
  • Provided architectural guidance to multiple big data projects like eCommerce, Morning Foods, Nielsen IRI, Demand Planning, Case Fill, and 10 projects as part of Richmond project initiative.
  • Conducted detailed code reviews and provided feedback for all Confidential ’s big data projects in Hadoop and MemSQL environments.
  • Defined naming and programming standards for both Hadoop and MemSQL platforms.
  • Provided technical guidance to all the projects in Confidential MemSQL environment and worked with MemSQL development team to resolve several issues in MemSQL 5.0 version. MemSQL is an in-memory, distributed, SQL database management system which is similar to Hadoop.
  • Responsible for administration of MemSQL cluster and upgrades in Prod, QA and Dev environments.
  • Developed a generic ETL framework (Keystone Data Loader) with Spark to transfer data between 3 Keystone platforms (Hadoop, MemSQL, SQL Server) because most of the projects require data from at least two out of three Keystone platforms.
  • Enhanced Keystone Data Loader framework to achieve high performance. With this Spark based framework, I was able to bring down the data transfer speeds from 1 hour (for SAP BODS and other ETL tools) to 4-5 minutes.
  • Migrated couple of existing on-prem data warehouse projects in SQL Server to Hadoop (Cloud Migration)
  • Designed and developed Crunch and Orchestro Customer Data projects using Apache Spark framework and NiFi in Hadoop platform.
  • Crunch project contains secure data. Provided security by creating encryption zones in Hortonworks platform with Knox and Ranger.
  • Involved in design meetings with Confidential Enterprise Architects, Infrastructure, and Data Science teams.

Environment: Apache Spark, NiFi, MemSQL, Scala, Spark SQL, Hadoop, HDFS, Hive, Hortonworks, Big Data, AWS S3, Cloud Migration, Apache Knox, Apache Ranger, SQL Server, Toad, SQL, Data Warehouse, SAP BODS, SAP, Tableau, TFS.

Confidential, San Rafael, CA

Big Data Architect

Responsibilities:

  • As hands on Big Data Architect, designed, developed and provided technical guidance for Confidential Data Platform (ADP) and Marketing Automation Platform (MAP) projects.
  • Designed and Developed ADP real time data ingestion module using Spark, Scala, Kafka, HBase, Hive, and Attunity Replicate.
  • Used Spark Streaming for receiving real time CDC data from Attunity Replicate through Kafka.
  • Used Spark RDD functions, transformations, and actions for processing real time data.
  • Developed all the Spark programs using Scala. Used Scala collections, traits, higher order and anonymous functions.
  • Created views on Hive tables which contains full and incremental data to provide access to real time data.
  • As part of initial Marketing Automation Platform (MAP) project, extended Product team implemented Event Based Ingestion (EBI) solution for data ingestion using Flume, Kafka, S3, Docker Containers, Mesos.
  • Received user email activity data from sendwithus and ingested that real time data in S3 from Kafka, Flume custom handlers.
  • Implemented a centralized Data Lake in Hadoop with data from various sources.
  • Used Sqoop to import various RDBMS table data into Confidential Data Lake (HDFS) on daily basis.
  • Fast Access was developed using Amazon Redshift and finally connecting it to Qlikview for Visualizations and reporting.
  • Designed Subscription Messaging and MailBroker applications to utilize Marketo (marketing automation software) to send emails to Confidential end customers and vendors.
  • Copied Marketo activity data to Amazon S3 (Simple Storage Service) using Sqoop and performed several analytics on Marketo activity data using Hive.

Environment: Apache Spark, Spark Streaming, Kafka, Scala, Spark SQL, Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, HBase, Big Data, NoSQL, Qubole, Docker Containers, Mesos, Cleo Harmony, Amazon Web Services, AWS S3, RedShift, EMR, EC2, SQS, Node JS, Dynamo DB, Denodo, Marketo, SendWithUs, SQL Workbench, Data Warehouse, QlikView, GitHub, Sparx Enterprise Architect, Scrum, Agile Methodology.

Confidential, Atlanta, GA

Hadoop Lead Developer

Responsibilities:

  • Analyzed and generated reports based on the previous NS train accidents, repairs, maintenance, etc. data using Apache Hadoop, Pig, and Hive frameworks.
  • Developed several Map Reduce programs to extract necessary information from unstructured data provided by ITC messaging and various other systems.
  • Pig framework is used to categorize the wayside status, base stations, and subscriptions messages.
  • Hive framework is used for generating reports from train accidents, repairs, and maintenance information.
  • Wrote several Hive-QL queries to perform the data analysis on the locomotive and base stations data.
  • Sqoop is used for transferring big data stored in HDFS to relational databases such as DB2, Oracle, etc and vice versa.

Environment: Apache Hadoop, Hive, Sqoop, Big Data, NoSQL, MongoDB, Cassandra, Cloudera, HBase, ZooKeeper, Pig, Sencha Ext JS, Ajax, JavaScript, PVCS, REST, Jersey, Web Services, Ant, HP Service Manager, Linux, Windows.

Confidential, Birmingham, AL

Research Assistant

Responsibilities:

  • Designed and implemented a parallel framework for large scale mesh generation.
  • Used Message Passing Interface (MPI) for inter-process communication and to perform parallel I/O.
  • To achieve performance gain, threads were created and managed with POSIX threads.
  • Used METIS library for domain decomposition and advance front technique for mesh generation.
  • Achieved load balance and developed several parallel algorithms.
  • Published several papers in high performance computing and engineering conferences.

Environment: Message Passing Interface (MPI), METIS, POSIX, C, C++, Linux

We'd love your feedback!