We provide IT Staff Augmentation Services!

Big Data Architect Resume

3.00/5 (Submit Your Rating)

Wilmington, DE

SUMMARY

  • Overall 10 years of experience in designing, implementing and supporting Big Data applications for various ETL data warehousing projects using tools like Apache Spark, Jenkins, Azkaban, Cassandra, Databricks.
  • Has hands - on wif Hadoop Technologies like Hive, Sqoop, Kafka, Zookeeper, Apache Spark.
  • Expertise in understanding the data requirements for designing and implementing optimal enterprise applications.
  • Solutioned and architected the most optimized data pipelines for clients dat reduced time-to-process latency by 60%.
  • Initiated and lead a team of cross functional engineers to bring functional changes for on-time product delivery.
  • Implemented highly efficient data pipelines dat were capable of processing large sets of Structured, Semi - structured and Unstructured datasets and supporting Bigdata applications.
  • Experience in NoSQL databases like Mongo DB and Cassandra.
  • Hands on experience in setting up the Apache Spark, Cassandra, MongoDB infrastructure.
  • Designed and developed proofs-of-concept Machine Learning application using MLlib libraries.
  • Solutioned and designed the infrastructure for most efficient usage of Apache Spark clusters in Dockers
  • Experience in Apache Spark cluster and streams processing using Spark Streaming.
  • Designed the Continuous Deployment and Continuous Integration pipelines for various teams.
  • Conducted Cross-team sessions on best practices of using the data pipelines.
  • Excellent leadership, problem solving and time management skills.
  • Polyglot abilities wif experiences in Scala, Java, Shell Scripting, Python and R.

TECHNICAL SKILLS

Programming Languages: Scala, Java, Shell Scripting, Python, R

Database Management Systems: Cassandra, MongoDB, Oracle, Hive

Query Languages: CassandraQL, ANSI SQL, HiveQL

Big Data Technologies: Spark Framework, Azkaban, Presto, Databricks, Tableau, Machine Learning, Neo4J, Kafka, Hadoop Map Reduce, PIG, HBase

Cloud Technologies: Amazon AWS (Novice), Microsoft Azure (Novice)

Build & Deploy Tools: Maven, Gradle, Jenkins

Hardware: Backup and Recovery Management, Installing and configuring the peripherals, components and drivers, LAN/Router setup, VPN setup

Orchestration/ DevOps: Dockers, Jenkins

PROFESSIONAL EXPERIENCE

Confidential

Big Data Architect

Responsibilities:

  • Designing a solution for moving on-prem data warehouse to a scalable cloud.
  • Perform in depth product and cost analysis for cloud migration.
  • Designing a cross domain platform for capturing a 360-degree view for the business for prospect targeting.
  • Evaluate cloud products to best suit migration requirement.
  • Design real time application using MapR Streams and Spark Streaming.
  • Designed and modelled Hive and Hbase ingestion layer.
  • Implemented Hadoop/Spark/Scala based application wif optimized performance.
  • Generated reports using Spark/Scala application.
  • Reduced time-to-delivery for reporting layout by 40% by extensive code reviews and fine tuning the existing applications.
  • Involved in onsite - offshore coordination and daily team meetings wif all developers.
  • Directly involved in discussions wif stake holders, business analysts to understand the requirements and providing inputs as required.
  • Compliant in adhering to strict SLAs as set in agreement wif technology and business teams.
  • Proficient in following agile methodology for all deliverables.
  • Actively involved in creating stories, grooming them wif scrum master for the team.

Confidential, Wilmington, DE

Big Data Architect/Tech Lead

Responsibilities:

  • Architected a highly scalable and optimized framework in Scala Spark for Source Data Validation. dis module now serves as the primary step for all the ingestion processes.
  • Re-designed several Hive and Greenplum (presto) tables to improve the pull-performance metrics to reduce load on cluster and improve Time-to-delivery of reports.
  • Designed and implemented ingestion pipelines using Hadoop/Spark as processing engine to handle high volume data influx int o Hive tables
  • Leading a team here onsite and in offshore to meet the business requirements in the allotted SLA.
  • Involved in triaging production issues wif the existing pipelines wif the L2 and Business teams. dis includes but not limited to probing, reporting and fixing the issues.
  • Extensively used Hadoop/HDFS for intermediatory storage.
  • Actively involved in direct business facing conversations wif the Product Owners, Data Domain Owners for requirement gathering and understanding.
  • Involved in cluster monitoring wif L2 team to look for pain points to further optimize the ingestion pipeline to get most of the existing cluster in the most cost-effective way.
  • Implemented the CI/CD pipeline for current project using Chase internal tool dat operates out-of-box on Jenkins for all our Spark jobs.
  • Currently leading a team dat is working on design implementation of common utilities in Java Spark as User Defined Transformations viz JSON CSV PARSER, HDFS TO HDFS ingestion, data split UDT to be used in data churning and curation.
  • Trained and modernized the team in best practices of Git and development process in IntelliJ.
  • Evaluated new tools while following the enterprise architecture guidelines.

Confidential, Minneapolis MN

Big Data Architect

Responsibilities:

  • Architected an optimized and efficient plug-and-go framework in Scala Spark to facilitate most of the heavy ETL operations in Hadoop/Spark.
  • Designed the ETL processes to extract secure data in xml, csv, and fixed width formats from legacy mainframe systems for hydrating the AWS S3 data lake.
  • Developed common utility packages to facilitate custom job auditing, sending emails as a failure alert mechanism, date-time conversion and a custom write data to file feature.
  • Solutioned an out-of-box design to reduce the data transfer latency by ~20%.
  • Assisted the Dev-Ops teams to setup the separate build environments for dev, QA, pre-production and production environments in Jenkins.
  • Integrated data application wif a Slack Bot to send out Slack messages in an event of failure.
  • Designed a pragmatic solution to maintain the sanity of Data Lake by capturing bad/corrupt data.
  • Using Hadoop/HDFS as underlying foundational layer for data layer.

Confidential, Deerfield, IL

Senior Data Engineer/ Tech Lead - Big Data (Spark/ Scala/Cassandra)

Responsibilities:

  • Architected and designed data ingestion pipelines from Cassandra Database using Spark 1.6.1 framework for Walgreens Photo eCommerce.
  • Set up the Spark infrastructure for Walgreens eCommerce.
  • Developed scripts to automate the build and deploy activities to facilitate the QA team to execute the jobs. Presented and educated the QA and support teams on basic concepts and best practices of working wif Spark and Cassandra.
  • Guided teams of junior developers wif best practices of programming in Scala, Shell Scripting, and Gradle.
  • Conducted meetings wif clients to understand the requirements and daily off shore call to guide the junior developers wif on-going development process.
  • Developed prototypes applications as a part of proof-of-concept for use cases viz, Recommender Model, Drools Rule Engine on Spark.
  • Used Hadoop/HDFS ecosystem as data layer for temp table creations.
  • Worked on Proof of Concept for setting up Spark Clusters using Dockers.
  • Recognized for optimizing the jobs and reduced run time latency from 14 hours to 3 hours.
  • Recognized for providing productive and effective solutions to several engineering problems.

Confidential, Burbank, CA

Data Engineer Consultant - Hadoop Developer (Spark Platform)

Responsibilities:

  • Developed ELT Java Spring Batch Framework for validating tables in Datalake - hosted as Spark system AWS S3 as metastore.
  • Developed scripting modules for scheduling jobs on Azkaban RHEL Servers in Python.
  • Developed modules in Spark Application to scrape data from external websites wif improved efficiency and parallelism.
  • Maintained, Organized, Structured data in AWS S3 Datalake for the Analysts.
  • Improved performance of hive querying data on Datalake from 9 minutes to 40 seconds.
  • Designed and Implemented Architecture for data ingest projects from sources viz. Nielsen Co., Synergy, iSpotTV, Crimson-Hexagon.
  • Built high performant table in Hive for Disney Movie Anywhere (DMA) customer details.
  • Automated the data ingestion process for Nielsen and Synergy.
  • Implemented optimized approach for storing data from external hive tables on Hadoop.
  • Supported DevOps team in optimizing the new Spark 1.4.1 environment to suit our business.
  • Assisted analytics team on successfully troubleshooting critical issues like missing data, inconsistent data, garbage data, and optimization and query analysis.
  • Developed python pipelines on Databricks platform.

Confidential

Graduate Programmer

Responsibilities:

  • Developed and maintained applications on IIS Web server.
  • Developed Oracle database entities for student thesis database.
  • Developed Java Spring Batch applications for Graduate Colege.
  • Administered and maintained databases for Office of Research, Office of Compliance, and Student Thesis database to provide end user support.
  • Administered and maintained database for Program of Study for the University.
  • Maintained web portal for faculty, staff and students

Confidential

Responsibilities:

  • Prepared lectures for undergraduate students.
  • Developed and supervised various laboratory activities for computer science courses.
  • Assisted students in recruitment and outreach activities.
  • Evaluated department programs and participated in committees.
  • Monitored department programs and associated activities.
  • Courses conducted: Compiler Design, Formal Language Automata Theory, Data Structures, and Concepts of C Programming, Object Oriented Programming wif C++

Confidential

Technical Associate

Responsibilities:

  • Gathered and analyzed Siebel ERP requirements in BRD in Agile environment.
  • Developed, unit and integration tested Siebel ERP enhancements.
  • Performed code migrations to production environments and troubleshot issues during production releases.
  • Resolved critical server related problems reported in COX Siebel development and test Linux environments. Developed shell scripts to automate the code compilation process.

We'd love your feedback!