Sr. Big Data Architect Resume
Washington, DC
PROFESSIONAL SUMMARY
- Above 10+ years of experience as Big Data Architect/Data Modeler/Data Architect and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and TEMPeffective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for TEMPeffective and optimum performance in OLTP and OLAP environments.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala and CHEF.
- Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on teh Yarn cluster.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
- Experienced in configuring and administering teh Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through teh use of multiple ETL tools such as Informatica Power Centre.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
- Experienced on implementation of a log producer in Scala dat watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Excellent experienced on NoSQL databases like MongoDB, Cassandra and write Apache Spark streaming API on Big Data distribution in teh active duster environment.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Performed teh performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and Data Stage.
- Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
TECHNICAL SKILLS
Big Data technologies: MapReduce, HBase 1.2, HDFS, Sqoop 1.4, Spark, Hadoop 3.0, Hive 2.3, PIG, Impala 2.1.
Cloud Architecture: Amazon AWS, EC2, Elastic Search, Elastic Load Balancing & Basic MS Azure
Data Modeling Tools: ER/Studio V17, Erwin 9.7, Power Sybase Designer.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9/7
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Testing and defect tracking Tools: HP/Mercury (Quality Center, Win Runner, Quick Test Professional, Performance Center, Requisite, MS Visio & Visual Source Safe
Operating System: Windows, Unix, Sun Solaris
ETL/Datawarehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau 10, and Pentaho.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
PROJECT EXPERIENCE
Confidential - Washington, DC
Sr. Big Data Architect
Responsibilities:
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive with Cloud Architecture.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Worked on AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
- Worked with AWS to implement teh client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Extracted teh data from MySQL, AWS Redshift into HDFS using Sqoop.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored teh data into HDFS in CSV format.
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Explored with teh Spark for improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting teh distributed collection of data organized into named columns.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process teh huge amount of structured data.
- Assigned name to each of teh columns using case class option in Scala.
- Used Talend for Big data Integration using Spark and Hadoop
- Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
- Identify query duplication, complexity and dependency to minimize migration efforts
- Worked on Talend Magic Quadrant for performing fast integration tasks.
- Performed data profiling and transformation on teh raw data using Pig, Python, and Java.
- Used Apache Spark for batch processing to source teh data.
- Developed predictive analytic using Apache Spark Scala APIs.
- Involved in working of big data analysis using Pig and User defined functions (UDF)
- Created Hive External tables and loaded teh data into tables and query data using HQL.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (Cassandra)
- Responsible for importing log files from various sources into HDFS using Flume.
- Expert in performing business analytical scripts using Hive SQL.
- Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Hortonworks Distributions (HDP) and MapR.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
Environment: Hadoop 3.0, Map Reduce Frameworks, HBase 1.2, Hive 2.3, AWS, EC2, S3, RDS, Redshift, VPC, MySQL, Redshift, Sqoop, CSV, HDFS, Spark, Kafka, Scala, ETL, YARN, Talend, Kerberos, Pig, Python, Java, UDF, HQL, NoSQL, Flume 1.8, Cassandra 3.11, Hortonworks, MapR, Tableau r15
Confidential - Troy, NY
Sr. Big Data Analyst
Responsibilities:
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per teh business requirements.
- Worked in teh Advanced Operational Analytics and Big Data Analysis team.
- Designed business layer, database layer, and implemented transaction management into teh existing architecture.
- Worked in Agile environment and participated in daily Stand-ups/Scrum Meetings.
- Worked on NOSQL databases such as MongoDB, HBase and Cassandra to enhance scalability and performance.
- Created Load Balancer on AWS EC2 for stable cluster and services which provide fast and TEMPeffective processing of Data.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Used AWS Lambda to perform data validation, filtering, sorting or other transformations for every data change in HBase table and load teh transformed data to another data store.
- Integrated Hadoop frameworks/technologies such as Hive and HBase to further operational and analytical experience.
- Loaded data from different servers to S3 bucket and setting appropriate bucket permissions.
- Created Hive queries for supporting teh existing application.
- Wrote teh HiveQL and manage Hive Meta store server to control different advanced activities.
- Worked with statistical analysis patterns and create teh dashboards for quick references and share to teh internal customers on daily, weekly or monthly basis.
- Worked on partitioning Hive tables and running scripts parallel to reduce run time of teh scripts.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Worked with streaming and Data ware housing projects.
- Installed and configured Hive and written Hive UDFs.
- Worked in Json scripts, mongo dB and Unix environment to non-Sql data clean-up grouping and create teh analysis reports.
- Wrote python scripts and java coding for business applications and MapReduce programs.
- Worked with hive warehouse directory and hive tables and services.
- Performed data cleaning and data preparation tasks to convert data into a meaningful data set using R
- Analyzed large data sets (structured and unstructured) using Hive queries, R Programming & Pig Scripts.
- Used Spark shell for interactive data analysis and process using Spark Sql to query structured data.
- Created Stored Procedures to communicate with SQL database.
- Involved in writing complex SQL Queries and provided SQL Scripts for teh Configuration Data which is used by teh application.
- Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
- Used Tableau to generate dashboards and teh statistical reports and created a portal using teh Tableau JavaScript API
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
Environment: NOSQL, MongoDB 3.6, HBase 1.2, Cassandra, AWS, EC2, Agile, Amazon Redshift, Hadoop frameworks, S3, UDFs, Json, Scripts, UNIX, MapReduce, Python, R, Tableau
Confidential - Little Rock, AR
Hadoop Engineer
Responsibilities:
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.
- Worked extensively with teh No SQL databases like MongoDB and Cassandra.
- Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
- Provided technical support during delivery of MDM (Master Data Management) components.
- Developed Spark scripts by using Scala shell commands as per teh requirement.
- Extensively worked on teh core and Spark SQL modules of Spark.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Worked with Data Governance, Data Quality and Metadata Management team to understand project.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Created HBase tables to store various data formats of data coming from different sources.
- Responsible for importing log files from various sources into HDFS using Flume.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate teh tasks of loading teh data into HDFS and pre-processing with Pig.
- Designed Nifi to pull data from various sources and push it in HDFS and Cassandra.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Worked with Nifi for managing teh flow of data from source to HDFS.
- Created customized BI tool for manager team dat perform Query analytics using HiveQL.
- Used Hive and Pig to generate BI reports.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats in XML file formats.
- Used Apache Kafka for tracking data ingestion to Hadoop cluster.
- Integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Hive, Pig, and Sqoop.
- Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
- Used Impala for data analysis.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MongoDB, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Oozie, Kafka, SQL, ETL, Cloudera Manager, MySQL.
Confidential - Atlanta GA
Sr. Data Modeler/Data Architect
Responsibilities:
- Responsible for teh data architecture design delivery, data model development, review, approval and Data warehouse implementation.
- Designed and developed teh conceptual then logical and physical data models to meet teh needs of reporting.
- Familiarity with a NoSQL database such as MongoDB.
- Involved in designing and developing Data Models and Data Marts dat support teh Business Intelligence Data Warehouse.
- Implemented logical and physical relational database and maintained Database Objects in teh data model using Erwin 9.5
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Used SDLC Methodology of Data Warehouse development using Kanbanize.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
- Performed teh Data Mapping, Data design (Data Modeling) to integrate teh data across teh multiple databases in to EDW.
- Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Developed Master data management strategies for storing reference data.
- Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
- Involved in Testing like Unit testing, System integration and regression testing.
- Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
- Perform reverse engineering of teh dashboard requirements to model teh required data marts.
- Developed Source to Target Matrix with ETL transformation logic for ETL team.
- Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
- Created Data Migration and Cleansing rules for teh Integration Architecture (OLTP, ODS, DW).
- Handled performance requirements for databases in OLTP and OLAP models.
- Conducted meetings with business and development teams for data validation and end-to-end data mapping.
- Responsible for Metadata Management, keeping up to date centralized metadata repositories using Erwin modeling tools.
- Involved in debugging and Tuning teh PL/SQL code, tuning queries, optimization for teh Sql database.
- Lead data migration from legacy systems into modern data integration frameworks from conception to completion.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server 2014 database systems..
- Managed teh meta-data for teh Subject Area models for teh Data Warehouse environment.
- Generated DDL and created teh tables and views in teh corresponding architectural layers.
- Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted teh data from My SQL into HDFS using Sqoop
- Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract teh data from SQL Database.
- Participate in code/design reviews and provide input into best practices for reports and universe development.
- Involved in Netezza Administration Activities like backup/restore, performance tuning, and Security configuration
- Involved in teh validation of teh OLAP, Unit testing and System Testing of teh OLAP Report Functionality and data displayed in teh reports.
- Created a high-level industry standard, generalized data model to convert it into logical and physical model at later stages of teh project using Erwin and Visio
- Participated in Performance Tuning using Explain Plan and TKPROF.
- Involved in translating business needs into long-term architecture solutions and reviewing object models, data models and metadata.
Environment:Erwin 9.5, HDFS, HBase, Hadoop, Metadata, MS Visio, SQL Server 2016, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.
Confidential - Teaneck, NJ
Data Modeler/ Data Analyst
Responsibilities:
- Created Physical Data Analyst from teh Logical Data Analyst using Compare and Merge Utility in ER Studio and worked with teh naming standards utility.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Extensively used Star Schema methodologies in building and designing teh logicaldatamodel into Dimensional Models
- Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL*Loader and Handled Exceptions.
- Enforced referential integrity in teh OLTP data model for consistent relationship between tables and efficient database design.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Utilized SDLC and Agile methodologies such as SCRUM.
- Involved in administrative tasks, including creation of database objects such as database, tables, and views, using SQL, DDL, and DML requests.
- Worked on Data Analysis, Data profiling, and Data Modeling, data governance identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Loaded multi format data from various sources like flat-file, Excel, MS Access and performing file system operation.
- Used T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data marts.
- Worked on Physical design for both SMP and MPP RDBMS, with understanding of RDMBS scaling features.
- Wrote SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
- Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
- Performed ETL SQL optimization designed OLTP system environment and maintained documentation of Metadata.
- Involved withDataAnalysis primarily IdentifyingDataSets, SourceData, Source MetaData, Data Definitions andDataFormats
- Worked with developers on data Normalization and De-normalization, performance tuning issues, and provided assistance in stored procedures as needed.
- Used Teradata for OLTP systems by generating models to support Revenue Management Applications dat connect to SAS.
- Created SSIS Packages for import and export of data between Oracle database and others like MS Excel and Flat Files.
- Worked in teh capacity of ETL Developer (Oracle Data Integrator (ODI) / PL/SQL) to migrate data from different sources in to target Oracle Data Warehouse.
- Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
- Involved in creating tasks to pull and push data from Salesforce to Oracle Staging/Data Mart.
- Created VBA Macros to convert teh Excel Input files in to correct format and loaded them to SQL Server.
- Helped teh BI, ETL Developers in understanding teh Data Model, data flow and teh expected output for each model created
Environment: ER/Studio 8.0, Oracle 10g Application Server, Oracle Developer Suite, PL/SQL, T-SQL, SQL plus, SSIS, Teradata 13, OLAP, OLTP, SAS, MS Excel.
Confidential
Data Analyst
Responsibilities:
- Interacted with business users to identify and understand business requirements and identified teh scope of teh projects.
- Identified and designed business Entities and attributes and relationships between teh Entities to develop a logical model and later translated teh model into physical model.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Enforced Referential Integrity (R.I) for consistent relationship between parent and child tables.
- Work with users to identify teh most appropriate source of record and profile teh data required for sales and service.
- Involved in defining teh business/transformation rules applied for ICP data.
- Define teh list codes and code conversions between teh source systems and teh data mart.
- Developed teh financing reporting requirements by analyzing teh existing business objects reports
- Utilized Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for data profiling.
- Reverse Engineered teh Data Models and identified teh Data Elements in teh source systems and adding new Data Elements to teh existing data models.
- Created XSD's for applications to connect teh interface and teh database.
- Compare data with original source documents and validate Data accuracy.
- Used reverse engineering to create Graphical Representation (E-R diagram) and to connect to existing database.
- Generate weekly and monthly asset inventory reports.
- Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica)
- Coordinate with teh business users in providing appropriate, TEMPeffective and efficient way to design teh new reporting needs based on teh user with teh existing functionality
- Worked on some impact of low quality and/or missing data on teh performance of data warehouse client.
- Worked with NZ Load to load flat file data into Netezza tables. Good understanding about Netezza architecture.
- Identified design fl in teh data warehouse and executed DDL to create databases, tables and views.
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Involved in Data Mapping activities for teh data warehouse.
- Created and Configured Workflows, Work lets, and Sessions to transport teh data to target warehouse Netezza tables using Informatica Workflow Manager.
- Extensively worked on Performance Tuning and understanding Joins and Data distribution.
- Coordinated with DBAs and generated SQL codes from data models.
- Generate reports using crystal reports for better communication between business teams.
Environment: SQL/Server, Oracle9i, MS-Office, Embarcadero, Crystal Reports, Netezza, Teradata, Enterprise Architect, Toad, Informatica, ER Studio, XML, Informatica, OBIEE