- Above 10+ years of experience as Big Data Architect/Data Modeler/Data Architect and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala and CHEF.
- Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
- Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Excellent experienced on NoSQL databases like MongoDB, Cassandra and write Apache Spark streaming API on Big Data distribution in the active duster environment.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Performed the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and Data Stage.
- Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
Big Data technologies: MapReduce, HBase 1.2, HDFS, Sqoop 1.4, Spark, Hadoop 3.0, Hive 2.3, PIG, Impala 2.1.
Cloud Architecture: Amazon AWS, EC2, Elastic Search, Elastic Load Balancing & Basic MS Azure
Data Modeling Tools: ER/Studio V17, Erwin 9.7, Power Sybase Designer.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9/7
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Testing and defect tracking Tools: HP/Mercury (Quality Center, Win Runner, Quick Test Professional, Performance Center, Requisite, MS Visio & Visual Source Safe
Operating System: Windows, Unix, Sun Solaris
ETL/Datawarehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau 10, and Pentaho.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
Confidential - Washington, DC
Sr. Big Data Architect
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive with Cloud Architecture.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Worked on AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
- Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Extracted the data from MySQL, AWS Redshift into HDFS using Sqoop.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Explored with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process the huge amount of structured data.
- Assigned name to each of the columns using case class option in Scala.
- Used Talend for Big data Integration using Spark and Hadoop
- Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
- Identify query duplication, complexity and dependency to minimize migration efforts
- Worked on Talend Magic Quadrant for performing fast integration tasks.
- Performed data profiling and transformation on the raw data using Pig, Python, and Java.
- Used Apache Spark for batch processing to source the data.
- Developed predictive analytic using Apache Spark Scala APIs.
- Involved in working of big data analysis using Pig and User defined functions (UDF)
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (Cassandra)
- Responsible for importing log files from various sources into HDFS using Flume.
- Expert in performing business analytical scripts using Hive SQL.
- Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Hortonworks Distributions (HDP) and MapR.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
Environment: Hadoop 3.0, Map Reduce Frameworks, HBase 1.2, Hive 2.3, AWS, EC2, S3, RDS, Redshift, VPC, MySQL, Redshift, Sqoop, CSV, HDFS, Spark, Kafka, Scala, ETL, YARN, Talend, Kerberos, Pig, Python, Java, UDF, HQL, NoSQL, Flume 1.8, Cassandra 3.11, Hortonworks, MapR, Tableau r15
Confidential - Troy, NY
Sr. Big Data Analyst
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Worked in the Advanced Operational Analytics and Big Data Analysis team.
- Designed business layer, database layer, and implemented transaction management into the existing architecture.
- Worked in Agile environment and participated in daily Stand-ups/Scrum Meetings.
- Worked on NOSQL databases such as MongoDB, HBase and Cassandra to enhance scalability and performance.
- Created Load Balancer on AWS EC2 for stable cluster and services which provide fast and effective processing of Data.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Used AWS Lambda to perform data validation, filtering, sorting or other transformations for every data change in HBase table and load the transformed data to another data store.
- Integrated Hadoop frameworks/technologies such as Hive and HBase to further operational and analytical experience.
- Loaded data from different servers to S3 bucket and setting appropriate bucket permissions.
- Created Hive queries for supporting the existing application.
- Wrote the HiveQL and manage Hive Meta store server to control different advanced activities.
- Worked with statistical analysis patterns and create the dashboards for quick references and share to the internal customers on daily, weekly or monthly basis.
- Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Worked with streaming and Data ware housing projects.
- Installed and configured Hive and written Hive UDFs.
- Worked in Json scripts, mongo dB and Unix environment to non-Sql data clean-up grouping and create the analysis reports.
- Wrote python scripts and java coding for business applications and MapReduce programs.
- Worked with hive warehouse directory and hive tables and services.
- Performed data cleaning and data preparation tasks to convert data into a meaningful data set using R
- Analyzed large data sets (structured and unstructured) using Hive queries, R Programming & Pig Scripts.
- Used Spark shell for interactive data analysis and process using Spark Sql to query structured data.
- Created Stored Procedures to communicate with SQL database.
- Involved in writing complex SQL Queries and provided SQL Scripts for the Configuration Data which is used by the application.
- Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
Environment: NOSQL, MongoDB 3.6, HBase 1.2, Cassandra, AWS, EC2, Agile, Amazon Redshift, Hadoop frameworks, S3, UDFs, Json, Scripts, UNIX, MapReduce, Python, R, Tableau
Confidential - Little Rock, AR
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.
- Worked extensively with the No SQL databases like MongoDB and Cassandra.
- Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
- Provided technical support during delivery of MDM (Master Data Management) components.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Extensively worked on the core and Spark SQL modules of Spark.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Worked with Data Governance, Data Quality and Metadata Management team to understand project.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Created HBase tables to store various data formats of data coming from different sources.
- Responsible for importing log files from various sources into HDFS using Flume.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Designed Nifi to pull data from various sources and push it in HDFS and Cassandra.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Worked with Nifi for managing the flow of data from source to HDFS.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Used Hive and Pig to generate BI reports.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats in XML file formats.
- Used Apache Kafka for tracking data ingestion to Hadoop cluster.
- Integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Hive, Pig, and Sqoop.
- Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
- Used Impala for data analysis.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MongoDB, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Oozie, Kafka, SQL, ETL, Cloudera Manager, MySQL.
Confidential - Atlanta GA
Sr. Data Modeler/Data Architect
- Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
- Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
- Familiarity with a NoSQL database such as MongoDB.
- Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin 9.5
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Used SDLC Methodology of Data Warehouse development using Kanbanize.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
- Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
- Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Developed Master data management strategies for storing reference data.
- Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
- Involved in Testing like Unit testing, System integration and regression testing.
- Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
- Perform reverse engineering of the dashboard requirements to model the required data marts.
- Developed Source to Target Matrix with ETL transformation logic for ETL team.
- Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
- Created Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Handled performance requirements for databases in OLTP and OLAP models.
- Conducted meetings with business and development teams for data validation and end-to-end data mapping.
- Responsible for Metadata Management, keeping up to date centralized metadata repositories using Erwin modeling tools.
- Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
- Lead data migration from legacy systems into modern data integration frameworks from conception to completion.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server 2014 database systems..
- Managed the meta-data for the Subject Area models for the Data Warehouse environment.
- Generated DDL and created the tables and views in the corresponding architectural layers.
- Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
- Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract the data from SQL Database.
- Participate in code/design reviews and provide input into best practices for reports and universe development.
- Involved in Netezza Administration Activities like backup/restore, performance tuning, and Security configuration
- Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
- Created a high-level industry standard, generalized data model to convert it into logical and physical model at later stages of the project using Erwin and Visio
- Participated in Performance Tuning using Explain Plan and TKPROF.
- Involved in translating business needs into long-term architecture solutions and reviewing object models, data models and metadata.
Environment:Erwin 9.5, HDFS, HBase, Hadoop, Metadata, MS Visio, SQL Server 2016, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.
Confidential - Teaneck, NJ
Data Modeler/ Data Analyst
- Created Physical Data Analyst from the Logical Data Analyst using Compare and Merge Utility in ER Studio and worked with the naming standards utility.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Extensively used Star Schema methodologies in building and designing the logicaldatamodel into Dimensional Models
- Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL*Loader and Handled Exceptions.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Utilized SDLC and Agile methodologies such as SCRUM.
- Involved in administrative tasks, including creation of database objects such as database, tables, and views, using SQL, DDL, and DML requests.
- Worked on Data Analysis, Data profiling, and Data Modeling, data governance identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Loaded multi format data from various sources like flat-file, Excel, MS Access and performing file system operation.
- Used T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data marts.
- Worked on Physical design for both SMP and MPP RDBMS, with understanding of RDMBS scaling features.
- Wrote SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
- Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
- Performed ETL SQL optimization designed OLTP system environment and maintained documentation of Metadata.
- Involved withDataAnalysis primarily IdentifyingDataSets, SourceData, Source MetaData, Data Definitions andDataFormats
- Worked with developers on data Normalization and De-normalization, performance tuning issues, and provided assistance in stored procedures as needed.
- Used Teradata for OLTP systems by generating models to support Revenue Management Applications that connect to SAS.
- Created SSIS Packages for import and export of data between Oracle database and others like MS Excel and Flat Files.
- Worked in the capacity of ETL Developer (Oracle Data Integrator (ODI) / PL/SQL) to migrate data from different sources in to target Oracle Data Warehouse.
- Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
- Involved in creating tasks to pull and push data from Salesforce to Oracle Staging/Data Mart.
- Created VBA Macros to convert the Excel Input files in to correct format and loaded them to SQL Server.
- Helped the BI, ETL Developers in understanding the Data Model, data flow and the expected output for each model created
Environment: ER/Studio 8.0, Oracle 10g Application Server, Oracle Developer Suite, PL/SQL, T-SQL, SQL plus, SSIS, Teradata 13, OLAP, OLTP, SAS, MS Excel.
- Interacted with business users to identify and understand business requirements and identified the scope of the projects.
- Identified and designed business Entities and attributes and relationships between the Entities to develop a logical model and later translated the model into physical model.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Enforced Referential Integrity (R.I) for consistent relationship between parent and child tables.
- Work with users to identify the most appropriate source of record and profile the data required for sales and service.
- Involved in defining the business/transformation rules applied for ICP data.
- Define the list codes and code conversions between the source systems and the data mart.
- Developed the financing reporting requirements by analyzing the existing business objects reports
- Utilized Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for data profiling.
- Reverse Engineered the Data Models and identified the Data Elements in the source systems and adding new Data Elements to the existing data models.
- Created XSD's for applications to connect the interface and the database.
- Compare data with original source documents and validate Data accuracy.
- Used reverse engineering to create Graphical Representation (E-R diagram) and to connect to existing database.
- Generate weekly and monthly asset inventory reports.
- Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica)
- Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality
- Worked on some impact of low quality and/or missing data on the performance of data warehouse client.
- Worked with NZ Load to load flat file data into Netezza tables. Good understanding about Netezza architecture.
- Identified design fl in the data warehouse and executed DDL to create databases, tables and views.
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Involved in Data Mapping activities for the data warehouse.
- Created and Configured Workflows, Work lets, and Sessions to transport the data to target warehouse Netezza tables using Informatica Workflow Manager.
- Extensively worked on Performance Tuning and understanding Joins and Data distribution.
- Coordinated with DBAs and generated SQL codes from data models.
- Generate reports using crystal reports for better communication between business teams.
Environment: SQL/Server, Oracle9i, MS-Office, Embarcadero, Crystal Reports, Netezza, Teradata, Enterprise Architect, Toad, Informatica, ER Studio, XML, Informatica, OBIEE