We provide IT Staff Augmentation Services!

Bigdata Architect/lead Developer Resume

5.00/5 (Submit Your Rating)

Clinton, NJ

SUMMARY

  • Confidential is a Certified Bigdata Hadoop Architect/Lead Developer wif more TEMPthan fifteen years of IT experience in the architecture, design and development of highly scalable, secure systems and solutions.
  • He TEMPhas architected, and led both strategic and complex implementation projects using Bigdata Hadoop technologies for prestigious clients over the past five years and TEMPhas worked at client locations in the US, Sweden, Germany, Singapore to deliver products/ projects.
  • He TEMPhas managed and lead the team of 4 to 5 developers wif onsite/offshore model. Confidential is a AWS (cloud) Certified Solution Architect and TEMPhas been involved in design and architect applications to host on AWS environment.
  • His expertise lies in using Hadoop 1.x, 2.x and its ecosystem tools like HDFS, hive, storm, Spark, Scala, HDF Kafka, AWS RDS, Kerberos, ni - fi, logstash, Elasticsearch, flume, Sqoop, Hbase, hue, Ambari.
  • He TEMPhas experience in using scheduling mechanism like oozie, airflow, and AWS data pipeline to build complex workflow and TEMPhas proficiency in using various AWS services like DynamoDB, EMR, S3, CloudFormation, EC2, IAM, VPC etc.
  • He TEMPhas implemented data governance and data security using Cloudera navigator and sentry, ranger and TEMPhas experience in installing, configuring and upgrading Hadoop cluster using Cloudera manager and Ambari.
  • He TEMPhas been using Cloudera and Hortenworks hadoop distributions to implement Bigdata solutions and also TEMPhas strong experience in Microsoft Visio, maven, gradle, JUnit, IDE’s such as eclipse and IntelliJ.
  • He TEMPhas worked wif development, operations, and testing teams during new system deployments and worked on various activities of Agile methodology. He used JIRA as task management, bug tracking and time reporting tool.
  • He is very experienced in various phases of software development such as study, analysis, development, testing, implementation and maintenance of a batch and real-time systems. Confidential TEMPhas excellent inter-personal and communication skills.

TECHNICAL SKILLS

Bigdata/hadoop technologies: HDFS, Spark, Spark SQL, Strom, Hive, Kafka, Ni-Fi, Oozie, Airflow, Sqoop, Logstash, Elasticsearch, Cloudera manager, Navigator, Sentry, Impala, HBase, Ambari, Kerberos, Ranger

Programming languages: Java, Scala, C++, Unix Shell script

Hadoop distributions: Cloudera CDH (5.12.1), Hortonworks HDP

Reporting / Visualization tool: Hue, Zeppelin, Kibana, Jaspersoft

Cloud (AWS) technologies: S3, EC2, EMR, CloudFormation, DynamoDB, VPC, IAM, etc

Methodologies and techniques: Agile (Scrum), Waterfall model

Systems architecture: Object-oriented Analysis and Design, UML

Other technologies: Data structures, Multithreading, JSON, C

Database: MySQL, SQLite, AWS RDS

Operating systems: Linux, Windows XP, Windows 7

Tools: IDE - Eclipse, IntelliJ, Jenkins

Version control system: Git, Subversion (SVN), VSS

Dev management tools: Confluence, JUnit, Maven, Gradle, Jenkins, GitLab, MPP

Other tools: JIRA, Putty, Bugzilla, Notepad++, Total commander, Slack

PROFESSIONAL EXPERIENCE

Bigdata Architect/Lead Developer

Confidential, Clinton, NJ

Responsibilities:

  • The development is based on Agile practices. Participated in scrum daily meetings, sprint planning, sprint review and sprint backlog refinement discussion
  • Responsible for preparing domain model for the beneficiary module. dat contains the identifying the entity and relationship.
  • Design and develop Spark using Scala application to transform contract and beneficiary data in to JSON as per business needs
  • Prepared the data model for NoSQL database (DynamoDB). JSON based object model prepared to store the data in DynamoDB table.
  • Design and develop tool to generate synthetic data using faker. Tool used to prepare millions of data to test various beneficiary scenarios.
  • Design strategy to perform one-time load of beneficiary data (around 13 Millions) from the current oracle database to DynamoDB.
  • Develop High Availability functionality for DyanmoDB streaming application.
  • Develop utility to load around twenty million records in to Kafka for data preparation.
  • Prepared interface document to describe transformation rules to perform data migration from Oracle DB to DynamoDB for beneficiary.
  • Evaluate approaches to do CDC (Change Data Capture) for RDS (PostgreSQL).
  • Design and Develop application to do CDC from DynamoDB stream and insert in to Kafka. Same change data consumed by downstream system.
  • Design and Implemented auditing functionality and audit data load in to Elasticsearch cluster using logstash from Kafka.
  • Reconciliation of one-time load from the current system to DynamoDB.
  • Prepared Wiki pages for all the design and development done.

Environment: Java, Spark, Spark Streaming, Spark SQL, Scala, Bash, HDFS, AWS DynamoDB, DynamoDB Stream, HDF Kafka, AWS RDS, Kerberos, Jenkins, GIT, Ranger, Ambari

Bigdata Architect/Lead Developer

Confidential

Responsibilities:

  • Responsible for requirement gathering, design and development. Actively involved in mentoring team of 4 developers, and interacting wif product owner and business analyst
  • The development is based on agile practices. Participated in scrum daily meetings, sprint planning, sprint review and sprint backlog refinement discussion
  • Involved in architecting batch data ingestion; like naming convention of input file, FTP landing zone, frequency of ingestion, and HDFS raw, core and derived data layers’ structure and partitioning mechanism for each data source.
  • Implemented Ni-Fi flows to ingest data from source FTP landing zone to HDFS raw layer, and a slack channel used to report any failure or data duplication during the ingestion
  • Implemented Airflow and Oozie workflows to transform raw to core and core to derived data layer (ETL) for each data source
  • Design and develop Airflow workflow using python and bash script instead of XML based oozie, also migrated legacy oozie workflows to Airflow. These to cut down workflow development time and to save lots of troubleshooting and monitoring efforts
  • Implemented raw to core Spark job to validate input data of format like CSV, XML, Avro and to transform and stored data in the core layer in Avro format.
  • Implemented pseudonymization functionality as part of transformation to hide customer personal information. Uses hbase to store pseudonymized data wif respect to predefined key-spaces like SSN, msisdn, imsi, etc.
  • Implemented hive queries to create derived dataset from core or existing derived data. Derived data is in parquet format and used by analyst to take the business decision.
  • Implemented utility to process and load data directly from raw to derived layer using spark (DataFrames and SQL) core Java job
  • Build hadoop autantication using Kerberos, sentry used for authorization and data governance using navigator
  • Extensively used hue (3.0 & 4.0) to troubleshoot oozie workflows, browsing and querying data
  • For reporting / visualization used zeppelin notebook using spark and Cognos using Impala.
  • Used Cloudera CDH 5.x.x distributions to process and store the data
  • Actively involved implementing EU’s GDPR compliance for customer’s personal data
  • Worked on Gradle multi-project dependencies
  • Improved performance by detecting and fixing critical issues in reusable components
  • Followed all the best practices of the software like agile methodology, continuous integration wif GitLab, code formatting by customizing IntelliJ formatter
  • Utilized IntelliJ for IDE and Git for version control. Wrote test cases in JUnit and providing code coverage in Cobertura. Involved in preparation of sprint backlog, story points, reporting bugs on JIRA.

Environment: Cloudera CDH 5.12.1, Core Java, Python, Bash, HDFS, Spark, Spark SQL, Spark DataFrames, Hive, Oozie, Airflow, HBase, Hue, Git, GitLab, Jira, Ni-Fi, Sentry, Navigator, IntelliJ, Gradle, Maven, Avro, Parquet, Slack, Kerberos

System Analyst

Confidential

Responsibilities:

  • Responsible for requirement gathering and analysis. The development was based on agile practices and IDE used was Eclipse.
  • Involved in build Proof of Concepts (PoC) for various bigdata tools and technologies to build the competencies
  • Involved in writing proposals for various bigdata and cloud-related projects
  • Designed and implemented storm kafka consumer spout in core java to get the data from kafka cluster
  • Used Core Java to develop strom hbase connector bolt to write data in hbase tables like collectors, devices and objects
  • Involved in creating and managing Kafka topics. Involved in managing on-premise HDP cluster using Ambari
  • Extensively used hue to debug hive and impala queries. Provisioned AWS cluster for EMR, data pipeline and S3 on Virtual Private Cloud (VPC)
  • Implemented hive and impala query to transform and access the data for various sources like google analytics, twitter, Instagram, etc.
  • Monthly, daily and weekly reporting wif respect to published blog prepared and Jaspersoft used as a reporting tool
  • Securing AWS environment using IAM, and monitoring cluster suing CloudWatch.
  • Implemented data pipe line workflow to process data using EMR
  • Architect and design AWS environment to host and migrate on-premise application on AWS cloud
  • Developed AWS infrastructure as code to setup development, testing and production environments using CloudFormation

Environment: HDP 2.1, Core Java, Hadoop, Hive, Impala, Kafka, Storm, HBase, AWS, EMR, CloudFormation, S3, IAM, CloudWatch, VPC, Eclipse, Ambari

We'd love your feedback!