We provide IT Staff Augmentation Services!

Big Data Architect Resume

Phoenix, AZ

SUMMARY

  • Overall 6 Plus years of IT experience in distinct phases of software development life cycle (SDLC) including Planning, Designing, Coding and Testing during the development of software applications in Risk Area.
  • 4+ Years of experience on the Hadoop Eco System with a good knowledge on Map Reduce, YARN, HDFS, Hive, Scala and Spark.
  • 3+ Years of experience in Finance Domain and Risk Area.
  • Good experience in NoSQL such as HBase and Cassandra.
  • Developed Python Scripts for automation & monitoring Jobs.
  • Hands on Experience in handling Spark for large data processing in streaming process along with Scala.
  • Have a Good understanding of the Machine Learning Libraries.
  • Working knowledge on Object Oriented Principles (OOP), Design & Development and have good understanding of programming concepts like data abstraction, concurrency, synchronization, multi - threading and thread communication, networking, security.
  • Extensive experience in applying best practices where ever possible in the overall application development process such as using Model-View-Controller (MVC) approach for better control on the application components.
  • Developed Hive Queries and automated those queries for analyzing on Hourly, Daily and Weekly Basis.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Deep understanding of data warehouse approaches, industry standards and industry best practices Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Strong development experience in Apache Spark using Scala.
  • Experience on Spark for handling large data processing in streaming process along with Scala.
  • Skilled in creating workflows using Oozie for cron jobs.
  • Experienced in writing custom UDFs and UDAFs for extending Hive and Pig core functionalities.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS), Teradata and vice versa.
  • Experience in creating High Level Design and Detailed Design in the Design phase.
  • Experience in identifying Bottlenecks in ETL Processes and Performance tuning of the production applications using Database Tuning, Partitioning, Index Usage, Aggregate Tables, Session partitioning, Load strategies, commit intervals and transformation tuning.

TECHNICAL SKILLS

ETL Tools/Data Modeling tools: Informatica Power Center, Power Exchange 10.1/9.x/8.x/7.1, MSBI. (Repository Manager, Designer, Server manager, Work Flow Monitor, Work Flow Manager), Erwin, FACTS and Dimension Tables, Physical and Logical Data Star Join Schema Modeling.

Databases: MS SQL Server, MS Access, SQL, PL/SQL

Tools: Toad, SQL developer, Visio, Magellan

Big Data Ecosystem: HDFS, Oozie, Hive, Pig, Sqoop, Zookeeper and HBase, Spark, Scala

Languages: SQL, PL/SQL, T-SQL, UNIX, Shell Scripting, Batch Scripting,Python

Operating Systems: UNIX, Windows Server 2008/2003, LINUX.

Job Scheduling: Control M, CA Autosys, Event Engine, cron jobs

PROFESSIONAL EXPERIENCE

Confidential, Phoenix, AZ

Big Data Architect

Responsibilities:

  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Hive.
  • Creating data model and database design as per technical requirement.
  • Envisioning and designing initial models and end to end workflow for Risk Application by interacting with clients, product managers.
  • Defining data extraction and data ingestion strategies from legacy applications into Big Data Platform as part of migration.
  • Developed Python scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
  • Migrated existing java application into microservices using spring boot and spring cloud.
  • Working knowledge in different IDEs like Eclipse, Spring Tool Suite.
  • Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment .
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Experienced with batch processing of data sources using Apache Spark.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV, RC formats.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running scripts for operations like duplicate check, null check etc. on data.
  • Involved in Administration, installing, upgrading and managing distributions of Hadoop, Hive, HBase.
  • Involved in performance of troubleshooting and tuning Hadoop clusters.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Implemented business logic by writing Hive UDFs in Java.
  • Developed Shell scripts and some of Perl scripts based on the user requirement.
  • As a SME perform code review and design architecture of application.

Environment: HDFS, Hive, SQL, Spark, Shell scripting, Cron Jobs.

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Developed and implemented core API services using Scala and Spark.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Installing, Upgrading and Managing Hadoop Clusters
  • Administration, installing, upgrading and managing distributions of Hadoop, Hive, HBase.
  • Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Extensively worked on creating End-End data pipeline orchestration using Oozie.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Processed the source data to structured data and store in NoSQL database Cassandra.
  • Created alter, insert and delete queries involving lists, sets and maps in Cassandra.
  • Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
  • Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
  • Responsible for developing, support and maintenance for the ETL ( Extract, Transform and Load) processes using Informatica Power Center.
  • Build the Dimension & Facts tables load process and reporting process using Informatica
  • Involved in the data analysis for source and target systems and good understanding of Data Warehousing concepts, staging tables, Dimensions, Facts and Star Schema, Snowflake Schema.
  • Extracted data from various data sources such as Oracle , SQL Server, Flat files and transformed and loaded into targets using Informatica .
  • Created Mappings and used transformations like Source Qualifier, Filter, Update Strategy, Lookup, Expression, Router, Joiner, Normalizer, Aggregator Sequence Generator and Address validator.
  • Developed mappings to load Fact and Dimension tables , SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
  • Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.

Environment: Map Reduce, HDFS, Hive, Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, J2EE, Eclipse, Informatica PowerCenter.

Confidential

Java /UI Developer

Responsibilities:

  • Consolidated to a single-page application for a streamlined user experience via Angular 2 and AJAX.
  • Worked with SOAP based and Restful Web Services to fetch dynamic content from backend databases.
  • Implemented transaction management, high level authentication and authorization, logging and exception Handling using Spring Security and AOP.
  • Created and developed the internal Angular 2 framework applications projects. Unit testing with jasmine and developed Angular 2 services to retrieve JSON data from the RESTFUL web Services and displayed the response in User interface pages.
  • Integrated Jersey with Jackson to serialize Java Object to JSON and reserialize JSON to Java Object.
  • Created multiple reusable components and services using Angular 2 built-in and custom directives.
  • Extensively used jQuery in implementing various GUI components in application portal.
  • System built using Model-View-Controller (MVC) architecture. Implemented the application using the concrete principles laid down by several design patterns such as Factory, Singleton, Data Access Object, and Service Locator.
  • Implemented AngularJS client-side form validation to validate the use inputs before passing to backend.
  • Involved in Enhancement of existing application utilizing AngularJS, created HTML navigation menu.
  • Used Cassandra database to perform the CRUD operations.
  • Used Jenkins continuous integration tool to do the deployments.
  • Experience in working JIRA to track the errors.

Environment: Java 8, HTML, CSS, JavaScript, Angular 2.0, AWS, Cassandra DB, SOAP, Web Sphere, Restful, PostgreSQL, Core Java, Node JS, XML, Maven 4.0, Eclipse, Ajax, jQuery, Junit, Spring-Hibernate integration framework, Linux GIT, UML, Jenkins.

Hire Now