Sr. Big Data Architect Resume
Kansas City, MO
PROFESSIONAL SUMMARY:
- 10+ years of experience in SDLC with key emphasis on teh trending Big Data Technologies - Spark, Scala, Spark Mlib, Hadoop, Tableau, Cassandra.
- Good Knowledge of Big Data and Data Warehouse Architecture and Designing Star Schema, Snow flake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling using Erwin, ER Studio.
- Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
- Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Extensive experience in data modeling, data architect, solution architect, data warehousing & business intelligence concepts and master data management (MDM) concepts.
- Expertise in architecting Big data solutions using Data ingestion, Data Storage
- Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
- Extensive knowledge in architecture design of Extract, Transform, Load environment using Informatica Power Center
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Experience in integration of various data sources definitions like SQL Server, Oracle, Sybase, ODBC connectors & Flat Files.
- Experience in Handling Huge volume of data in/out from Teradata/Big Data.
- Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
- Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
- Expertise in data analysis, design and modeling using tools like ErWin.
- Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
- Extensive experience in using Teradata BTEQ, FLOAD, MLOAD, FASTXPORT utilities.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
- Expert in AmazonEMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
- Experienced in testing data in HDFS and Hive for each transaction of data.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Strong Experience in working with Databases like Oracle 12C/11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
- Experienced in using database tools like SQL Navigator, TOAD.
- Experienced with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)
- Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
- Good experience in Shell programming.
- Knowledge in configuration and managing- Cloudera’s Hadoop platform along with CDH3&4 clusters.
- Knowledge and experience of architecture and functionality of NOSQL DB like Cassandra and Mongo DB.
TECHNICAL SKILLS
Hadoop /Big Data: Sqoop, Oozie, Flume, Scala, Akka, Kafka, Storm, Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper.
Data Modeling Tools: Erwin R6/R9, Rational System Architect, IBM Infosphere Data Architect, ER Studio and Oracle Designer.
No SQL Databases: Cassandra, mongo DB, Dynamo DB
Database Tools: Microsoft SQL Server12.0, Teradata 15.0, Oracle 11g/9i/12c and MS Access
Frameworks: MVC, Struts, Spring, Hibernate.
Operating Systems: UNIX, Linux, Windows, Centos, Sun Solaris.
Databases: Oracle 12c/11g/10g/9i, Microsoft Access, MS SQL
Languages: PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark
ETL/Datawarehouse Tools: Informatica 9.6/9.1/8.6.1/8.1, SAP Business Objects XIR3.1/XIR2.
Tools: OBIE 10g/11g, SAP ECC6 EHP5, Go to meeting, Docusign, Insidesales.com, Share point, Mat-lab.
PROFESSIONAL EXPERIENCE
Confidential, Kansas City, MO
Sr. Big Data Architect
Responsibilities:
- Implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizing Big Data technologies such asHadoop, Map Reduce Frameworks, HBase, Hive.
- Implementation of Big Data ecosystem (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
- Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need.
- Experience on BI reporting with at Scale OLAP for Big Data.
- Developed Complex ETL code through Data manager to design BI related Cubes for data analysis at corporate level.
- Unified data lake architecture integrating various data sources on Hadoop architecture.
- Used Sqoop to import teh data from RDMS to Hadoop Distributed File System (HDFS).
- Involved in loading and transforming large sets of data and analyzed them by running Hive queries and Pig scripts.
- Developed software routines in R, Spark, SQL to automate large datasets calculation and aggregation.
- Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
- Redesigned teh existing Informatica ETL mappings & workflows using Spark SQL.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Ingest data into Hadoop/Hive/HDFS from different data sources.
- Writing Scala code to run SPARK jobs in Hadoop HDFS cluster.
- Define and manage teh architecture and life cycle of Hadoop and SPARK projects
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Designed teh data processing approach within Hadoop using Pig.
- Identify query duplication, complexity and dependency to minimize migration efforts
- Technology stack: Oracle, Hortonworks HDP cluster, Attunity Visibility, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
- Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
- Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
- Worked with Spark and Python.
- Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Worked with AWS to implement teh client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Exploring with teh Spark for improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting teh distributed collection of data organized into named columns.
- Performed data profiling and transformation on teh raw data using Pig, Python.
- Experienced with batch processing of data sources using Apache Spark.
- Developing predictive analytic using Apache Spark Scala APIs.
- Involved in working of big data analysis using Pig and User defined functions (UDF).
- Created Hive External tables and loaded teh data into tables and query data using HQL.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (cassandra).
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored teh data into HDFS in CSV format.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process teh huge amount of structured data.
- Assigned name to each of teh columns using case class option in Scala.
- Implemented Spark GraphX application to analyze guest behavior for data science segments.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR.
- Experience in integrating oozie logs to kibana dashboard.
- Extracted teh data from MySQL, AWS RedShift into HDFS using Sqoop.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Experience on BI reporting with at Scale OLAP for Big Data.
- Expert in performing business analytical scripts using Hive SQL.
- Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Supported teh daily/weekly ETL batches in teh Production environment.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
Environment: Big Data, Informatica, Sybase, Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, Redshift, NOSQL, Machine learning, Sqoop, MYSQL.
Confidential, Auburn Hills, MI
Big Data Architect
Responsibilities
- Architecting, managing and delivering teh technical projects /products for various business groups..
- All teh data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats.
- Architected all teh ETL data loads coming in from teh source system and loading into teh data warehouse.
- Ingest data into Hadoop / Hive/HDFS from different data sources.
- Created Hive External tables to stage data and then move teh data from Staging to main tables
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of teh ETL orchestration using CDAP tool.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Experienced in working with Apache Storm.
- Implemented all teh data quality rules in Informatica data quality.
- Involved in Oracle PL/SQL query optimization to reduce teh overall run time of stored procedures.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Migrated large volume of PB data warehouse data to HDFS.
- Utilize AWS services with focus on big data Architect /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
- Experience in data cleansing and data mining.
- Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function.
- Worked on tools Flume, Storm and Spark.
- Proof-of-concept to determine feasibility and product evaluation of Big Data products
- Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Used Flume to collect, aggregate, and store teh web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Design of Redshift Data model, Redshift Performance improvements/analysis
- Continuous monitoring and managing teh Hadoop cluster through Cloudera Manager.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Developed Spark jobs to transform teh data in HDFS.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
- UsedHiveto analyze data ingested intoHBaseby usingHive-HBaseintegration and compute various metrics for reporting on teh dashboard
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce
- Developed teh code for Importing and exporting data into HDFS and Hive using Sqoop
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, Informatica Data Quality, Informatica Metadata Manager, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.
Confidential, Chicago, IL
Sr. Data Engineer
Responsibilities:
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed data Mart for teh base data in Star Schema, Snow-Flake Schema involved in developing teh data warehouse for teh database.
- Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required
- Extensively used Erwin as teh main tool for modeling along with Visio.
- Used R machine learning package predicted performance of certain samples.
- Worked on HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in forecast based on teh present results and insights derived from Data analysis.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Worked on Metadata Repository (MRM) for maintaining teh definitions and mapping rules up to mark.
- TrainedSpotfiretool and gave guidance in creatingSpotfireVisualizations to couple of colleagues
- Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
- Develop complex ETL mappings using Informatica Power Center and sessions using Informatica workflow manager.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Experienced with Pig Latin operations and writing Pig UDF's to perform analytics.
- Analyzed teh business requirements by dividing them into subject areas and understood teh data flow within teh organization.
- Worked on Programming using PL/SQL, Stored Procedures, Functions, Packages, Database triggers for Oracle.
- Designed various practical Data Visualizations, Charts, Dashboards, Prototypes and Demo, published it in various Tableau workbooks for Analytical Projects and Data Visualization teams.
- Created a Data Mapping document after each assignment and wrote teh transformation rules for each field as applicable.
- Analyzing data with Hive, Pig and Hadoop Streaming.
- Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required
- Configured & developed teh triggers, workflows, validation rules & having hands on teh deployment process from one sandbox to other.
- Created automatic field updates via workflows and triggers to satisfy internal compliance requirement of stamping certain data on a call during submission.
- Extensively used Erwin as teh main tool for modeling along with Visio
- Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships.
- Worked on Metadata Repository (MRM) for maintaining teh definitions and mapping rules up to mark.
- Developed data Mart for teh base data in Star Schema, Snow-Flake Schema involved in developing teh data warehouse for teh database.
- Developed enhancements toMongo DBarchitecture to improve performance and scalability.
- Forward Engineering teh Data models, Reverse Engineering on teh existing Data Models and Updates teh Data models.
- Performeddatacleaning anddatamanipulation activities using NZSQL utility.
- Analyzed teh physicaldatamodel to understand teh relationship between existing tables. Cleansed teh unwanted tables and columns as per teh requirements as part of teh duty being aDataAnalyst.
- Analyzed and understood teh architectural design of teh project in a step by step process along with teh data flow.
- Created DDL scripts for implementing Data Modeling changes. Created ERWIN reports in HTML, RTF format depending upon teh requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply teh data model changes.
Environment: Erwin r8.2, Oracle SQL Developer, OracleDataModeler, Informatica Power Center, Hive, Apache Storm, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, HBase, Machine Learning, Hadoop, HDFS, Map Reduce, Teradata 14, SSIS, R, Business Objects, SQL Server 2008, Windows XP, MS Excel.
Confidential
Data Analyst
Responsibilities:
- Performed standard management duties on SQL Server database including creating users accounts, monitoring daily backups, and performing regular analysis of database performance to suggest improvements.
- Assisted Kronos project team in SQL Server Reporting Services installation.
- Developed SQL Server database to replace existing Access databases.
- Performed testing and analysis of databases using SQL Server analysis tools.
- Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and PhysicalDataModels.
- Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
- Optimized teh existing procedures and SQL statements for teh better performance using EXPLAIN PLAN, HINTS, SQL TRACE and etc. to tune SQL queries.
- Teh interfaces were developed to be able to connect to multiple databases like SQL server and oracle.
- Designed and created web applications to receive query string input from customers and facilitateentering teh data into SQL Server databases.
- Performed thorough data analysis for teh purpose of overhauling teh database using SQL Server.
- Designed and implemented business intelligence to support sales and operations functions to increase customer satisfaction.
- Converted physical database models from logical models, to build/generate DDLscripts.
- Maintained warehouse metadata, naming standards and warehouse standards for future application development.
- Extensively used ETL to loaddatafrom DB2, Oracle databases.
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Expertise and worked on Physical, logical and conceptual data model
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snow flake Schemas
- Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
- Extensively used ETL methodology for supporting data extraction, transformations and loading processing, in a complex EDW using Informatica.
- Worked and experienced on Star Schema, DB2 and IMS DB.
Environment: ERWIN, UNIX, Oracle, PL/SQL, DB2, Teradata SQL assistant, DQ analyzer