We provide IT Staff Augmentation Services!

Big Data Architect Resume

4.00/5 (Submit Your Rating)

Mc Clean, VA

SUMMARY

  • 13+ years of SDLC experience in requirements, design, coding, testing and performance tuning.
  • 5 years as a big data architect & engineer in JP Morgan Chase using Spark and Hadoop frameworks
  • 5+ years in HDFS, Sqoop, Hive, Pig, Spark SQL, Streaming, MapReduce, Impala, Kafka & YARN.
  • 8+ years in Java/J2EE, Hibernate, MyBatis, REST/SOAP Web services, JMS, MQ, Drools
  • 3+ years in Scala & spark - shell, 3+ years in Python &pySpark
  • 2+ years in No SQL databases HBase, Cassandra, MongoDB, Couch DB, Mark Logic, Redis, Neo4J.
  • 2+ years in cluster administration, ticket/ issue resolution, deployment, monitoring &tuning
  • 2+ years in Kafka streaming, Spark streaming, Hive streaming, Kafka Confluent streaming
  • 4+ years in MPP data warehouses (Teradata, Greenplum, Exadata, Vertica) and data modeling.
  • 4+ years in ETL (Ab Initio, Informatica, Pentaho, Talend) and Meta data management (MDM).
  • 2+ years in BI reporting (Tableau, Qlikview, Cognos) &BA analytics (R and SAS) on Hive & Impala.
  • 8+ years in relational databases (Oracle, MySQL, SQL Server, Postgres), SQL queries, PL/SQL.
  • 2+ years in UI in JSF, JQuery, Ajax, Java script, HTML5, XML, D3.js, node.js, AngularJS, Bootstrap.
  • 4+ years in Spring Core, Spring MVC, Spring REST, Spring Boot, Spring Data JPA, Spring Security
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Very strong in architect and solution end to end big data project/POC and building data lake.
  • Experience in MVC, Lambda, SOA, GOF, TOGAF, design patterns, data structures, algorithms
  • Experience in identifying use cases, RFP project proposals and bringing work to the team.
  • Experience in building reusable frameworks, automations, best practices and design checklists.
  • Expert in performance tuning, benchmarking, innovating and building optimization techniques.
  • Successful team player and lots of experience in team leading, team building, team contribution.
  • Experience working with Business, Management, Architects, Global cross functional teams.
  • Experience in planning, prioritization, sizing, effort estimation, risks/issues escalation.
  • Decade experience working in agile scrum projects leveraging scrum master certification.
  • Experience in domains banking, healthcare, Telecom, Insurance and in-house applications.
  • Experience in requirements gathering, detailing, converting requirements to design documents.
  • Experience in preparing unit test cases, deployment checklists, code review documents.
  • Recognized as situational Leader during high-pressure, fast paced development, able to meet aggressive timelines, complex &changing requirements and high visibility projects.

SKILLSET

Spark 1.x,2.0: Spark Core, Spark SQL, Spark streaming, Spark ML, Spark-shell, PySpark

Hive0.x, 1.x: HQL, Partitions, Buckets, Compression, Serializes, Beeline, Hcatalog, stream

Hadoop 2.x: HDFS, MapReduce, YARN, Mesos, Formats(Parquet, ORC, Avro, Text, JSON)

Programming: Java4,5,6,7,8, Scala 2.8, 2.10, Python 2.7, 3.0, R3.3, Unix/Linux Shell scripting

Imp FW: Sqoop 1.4.6, Pig 0.16, Kafka 0.10, Impala 2.7, Flume1.7, Oozie4, Zookeeper3.4

NoSQL: HBase5.8, Cassandra3.2, MongoDB, Couch DB, Mark Logic, Redis, Neo4J

Distribution: Cloudera5.8, Hortonworks2.3, Map R5.0, Amazon (AWS), Microsoft Azure

Streaming: Kafka steaming, Kafka Confluent streaming, Hive Streaming, Spark Streaming

Apache FW: Flink, Storm, Samza, Drill, Mahout, Sentry,Phoenix, Kerberos, Stinger, Kylin

Security FW: Centrify, Kerberos, KMS, LDAP, Knox, HDFS encryption decryption

Apache UI: Nifi,Ambari, Zeppelin, Livy, Hue Search, logs;Lucene, Solr, Elastic search; ELK stack (Elastic search, log stash, Kibana), Splunk

AWS: EC2, S3, EMR, Redshift DW,Dynamo DB NoSQL,IoT, cloudsearch

Java/JEE: Core, Spring, Hibernate, MyBatis, REST/SOAP Web services, Rabbit MQ, Sonic MQ, Soup MQ, IBM MQ, Drools

ETL: AbInitio3.3, Pentaho7, Informatica10,Talend6.3

BI: Qlikview, Qliksense, Cognos, Tableau, Splunk, d3.js, spot fire, Jasper soft

Warehouse: Teradata15.10, Greenplum4.3.10, Exadata, Vertica, Netezza

Database: Oracle, SQL Server, DB2 MySQL, Postgres, PL/SQL

Web UI: JSF, JQuery, Ajax, Java script, HTML5, XML, node.js, AngularJS, Bootstrap

Servers: Spark Job Server, Spray, Play, Tomcat, WebLogic, JBoss

BPM: Collibra, Confidential PRPC 6.1,6.2,7, BPEL, WPS, WLI

Development: Eclipse Neon 4.6, Maven3.3.9, IntelliJ 2016, SBT0.13, Erwin7.3

Build tools: Git,GitHub, SVN, Tortoise, CVS, Jenkins, Puppet, Chef

PROFESSIONAL EXPERIENCE

Confidential, McClean, VA

Big data Architect

  • Created the architecture for the entire project. Created data layers using Hive tables in the data lake such as staging, archival, integration, metadata, work and semantic zones.
  • Provided designs for all the components right from ingestion to ETL to reporting & analytics.
  • Innovatively designed and implemented Change Data Capture (CDC) using Spark SQL data frames. It supports full, snapshot and delta loads when source has no Primary key, timestamp.
  • Implementing Meta Manager (M2) to capture metadata automatically during the execution of any job (Sqoop, hive, spark, Cassandra, MR, etc). Doesn’t need to manually generate and store Meta data. Shows the table level and column level lineage using D3 charts. Shows the dashboards for business, technical and operational Meta data at real time.
  • Integrated Collibra with Data Lake to provide full data governance into the organization. Created Business assets for HUB and PUB and data assets for RAW zones.
  • Innovatively designed ETL DRIVER in python to do entire work flow orchestration as DB driven.
  • Innovatively designed and implemented CMR Engine to build MDM with DB to store all business, technical and operational Meta data and with REST API to make CURD operations on this DB.
  • Innovatively designed& implemented file based event driven publish-subscribe Kafka system.
  • Designed & implemented the operational data model to store the job, task and error details.
  • Innovatively designed 5 options of updating huge billion record table and 20M record partitions.
  • Innovatively solved performance optimizations in storing look ups, configuration tables in memory; use same spark context throughout job, creating handshake files instead of objects.
  • Evaluated visual orchestration product CDAP and suggested the pros and cons of its buy-in.
  • Proved that Scala over Spark runs faster and takes less memory than Python over Spark due to faster Kryo serialization over Pickle and non-availability of primitive data types in python.
  • Designed Hive with ORC for the first time in ETL integration zone to support ACID transactions.
  • Implemented faster Hive tables with Partitions, buckets, formats, serializes and indexes
  • Generated massive volumes of data for testing using Talend ETL tool on Spark.
  • Designed Rule engine in spark to execute rules during ETL and also to filter error records. Rules are stored in Hive. Performing faster than Drools over spark with session management.
  • Exploring real time streaming from databases using options such as Kafka confluent streaming, Hive steaming or Spark streaming.
  • Converted all the business use cases into detailed requirements, low level designs and test cases.
  • Improved the memory allocation of the cluster by changing the block size, replication& JVM heap
  • Improved the execution time by altering CPU cores, dynamic resource allocation and shuffling
  • Fine-tuned the team’s code by reading Hive queries from file, data frame joins and saveastable
  • Designed & Implemented 15 levels of calculations in both Scala and Python over Spark.
  • Suggested and Configured Centrify and Kerberos for platform level security
  • Suggested and configured Knox for securing data in motion and KMS for securing data at rest
  • Generated Qlikview reports on Impala tables and views with Parquet format through ODBC
  • Implemented analytical components in python over spark leveraging sparkling pandas’ API.
  • Built automation script to clean up unnecessary HDFS storage in the dev environment
  • Ingested events coming from upstream into metadata zone using spark streaming’s D streams
  • Implemented a transactional system storing finance data in Cassandra to make it real time
  • Designed REST web services on spark-submit to provide data as a service to downstream systems
  • Implementing Lambda architecture to the existing project to support batch and real time access
  • Designing machine learning algorithms using Spark ML lib and analytics on Spark R over Impala
  • Designing new approaches in configuring, deploying, scheduling, monitoring and tuning
  • Evaluated and brought the best list of frameworks and tools into project including spark &Talend
  • Implementing M2 portal using Spring Core, Spring MVC, Spring REST, Spring Boot, Spring Data JPA, Spring Security
Confidential

Data Transporter

  • Successfully migrating 2 Peta Byte traditional Teradata data warehouse into Hadoop.
  • Implemented Data Ingestion and Data provisioning using Apache Spark's Data Frames.
  • Ingested data from MPP databases such as Teradata, Greenplum and Exadata and DBs such as Oracle, SQL Server, DB2, MySQL, and Postgres into HDFS. Provisioned the same.
  • Implemented Kafka messaging system to publish/consume events and store in Meta store.
  • Converting the Kafka consuming part into Spark Streaming setting up Kafka receivers.
  • Implemented federated querying joining databases, hive tables and HDFS data.
  • Created on fly Impala/Hive tables for BI Reporting when the data is landed into HDFS.
  • Converted SAS analytical data manipulation code into R exposing the Impala tables.
  • Initiated Oracle Data warehouse to Cassandra migration POC, design and data modeling.
  • Migrating the big meta audit table into HBase as this is write once and read many times.
  • Exposed it as REST service using Spray for Scala and using Jersey for Java.
  • Lead the team in building Data Registration Portal using Spring MVC, JQuery, Bootstrap, etc.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Implementing a Proof of Concept in Elastic search, Log stash, Kibana (ELK stack)
  • Implemented data partitioning, data compression, data indexing and data encryption.
  • Created Hive tables using Parquet file formats and processed data via Impala and Spark SQL.
  • Implemented SLA check, generating alerts, email notifications and reconciliation.
  • Implemented validations of user inputs, data cleansing, technical data quality and profiling.
Confidential

Project Lead

  • As a Certified Confidential Lead System Architect designed Enterprise Class structure.
  • Designed & Lead the implementation of entire application end to end on a very short dead line. Responsible right from getting business requirements, PROD releases till getting Business signoff
  • Designed & Implemented Social BPM feature Case following end to end.
  • Designed & Implemented Advanced Agent with Custom Queue functionality to avoid lock issues.
  • Designed & Implemented Standard agent for asynchronous update of payments through services. This design improved the user experience by not keeping them wait in the UI.
Confidential

Principal Consultant

  • Designed & implemented a 360-degree view of every detail of producer in the organization with producer demographics, licenses, appointments, contracts, and payment schedules.
  • Accomplished designing and developing this SAAS (Software as a service) project serving multiple customers using oracle, SQL server, DB2.
Confidential

Systems Engineer

  • Implemented the CSR WLI process to investigate whether the TN exists in the CSR repository or not by communicating through IREP Gateway and stores the search results which will be shown as the investigation report to the user.
  • Accomplished building the middle layer in WLI and the SOAP web services using J2EE.

Software Engineer

  • Implemented the backend logic and UI which deals with processing the customer’s request with documents, verifying the eligibility and resolving the query. After verifications, servicing center handles midterm adjustments, EMI rating, prepayment, collections.
  • Accomplished developing the UI forms, SQL queries and business logic in Java code.

We'd love your feedback!