- Above 9+ years of experience as Big Data Engineer/Data Modeler/Data Architect and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala and CHEF.
- Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
- Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Excellent experienced on NoSQL databases like MongoDB, Cassandra and write Apache Spark streaming API on Big Data distribution in the active duster environment.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Performed the performance and tuning Confidential source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and Data Stage.
- Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
Big Data technologies: MapReduce, HBase 1.2, HDFS, Sqoop 1.4, Spark, Hadoop 3.0, Hive 2.3, PIG, Impala 2.1.
Cloud Architecture: Amazon AWS, EC2, Elastic Search, Elastic Load Balancing & Basic MS Azure
Data Modeling Tools: ER/Studio V17, Erwin 9.7, Power Sybase Designer.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9/7
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Testing and defect tracking Tools: HP/Mercury (Quality Center, Win Runner, Quick Test Professional, Performance Center, Requisite, MS Visio & Visual Source Safe
Operating System: Windows, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau 10, and Pentaho.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
Confidential, San Antonio, TX
Sr. Big Data Engineer
- Expereiecn in Hortonworks Data Platform Performed on Sqoop jobs for ingesting data from MySQL to HDFS and created Hive external tables for querying the data.
- Experienced in using Spark Data Frame APIs to ingest Oracle data to S3 and stored in Redshift and wrote a script to get RDBMS data to Redshift.
- Experienced in creating RDDs, transformations and Actions while implementing spark applications.
- Developed Scala scripts, UDFs using both Data frames and RDD in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Optimized the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and RDD's.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Developed and Involved in loading data into Cassandra NoSQL Database.
- Processed the complex/nested JSON and CSV data using Data Frame API.
- Automatically scaled-up the EMR Instances based on the data and scheduled and executed Spark scripts in EMR Pipes.
- Validated the source and final output data and tested the data using Dataset API instead of RDD.
- Experience Designing the dimensional data model using Erwin Data Modeler (Star Schema, Snowflake Schema) Preparing High Level Design Document, Source to Target document and Data Dictionary Building DataMart and Data warehouse tables in Netezza and SQL Server with help of DBA.
- Designing and Developing parallel jobs using DataStage/Quality stages in DataStage Designer clients.
- Created configuration and parameter files for the reusable shell scripts.
- Developing reports using Tableau, SAP Lumira and D3.js
- Experience in version control tool GitHub
- Propose and developing dashboards using D3.js for analytical purpose Scheduling, Monitoring and supporting jobs using control Developing a purge script in shell script.
Environment: HDFS, Hive, Pig, Spark, IBM Datastage 11.5, Aginity, JSON, Netezza, DB2, MS SQL, GitLab, Cassandra.
Confidential, Houston, TX
Jr. Big Data Engineer
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Implemented solutions for ingesting data from various sources and processing the Data- Confidential -Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive with Cloud Architecture.
- Construct and maintain an appropriate, scalable, and easy-to-use infrastructure with various tools to support the development of actionable reports used in decision-making across the strategy team
- Develop and maintain reports, dashboards, cubes, and scorecards to deliver information requests and deepen the analytics capabilities of operations, fiscal, and strategy staff
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Worked on AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
- Worked with AWS to implement the client-side encryption as Dynamo DB does not support Confidential rest encryption Confidential this time.
- Extracted the data from MySQL, AWS Redshift into HDFS using Sqoop.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Explored with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns. supporting various human capital functions, including human capital strategy, workforce planning and analytics, recruiting, employee engagement and retention, and performance management
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process the huge amount of structured data.
- Assigned name to each of the columns using case class option in Scala.
- Used Talend for Big data Integration using Spark and Hadoop
- Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
- Identify query duplication, complexity and dependency to minimize migration efforts
- Worked on Talend Magic Quadrant for performing fast integration tasks.
- Performed data profiling and transformation on the raw data using Pig, Python, and Java.
- Used Apache Spark for batch processing to source the data.
- Developed predictive analytic using Apache Spark Scala APIs.
- Involved in working of big data analysis using Pig and User defined functions (UDF)
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (Cassandra)
- Responsible for importing log files from various sources into HDFS using Flume.
- Expert in performing business analytical scripts using Hive SQL.
- Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Hortonworks Distributions (HDP) and MapR.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
Environment: Hadoop 3.0, HBase 1.2, Hive 2.3, AWS, EC2, S3, RDS, VPC, MySQL, Redshift, Sqoop, HDFS, Spark, ETL, YARN, Talend, Python, UDF, HQL, NoSQL, Flume 1.8, Cassandra 3.11, Hortonworks, MapR, Tableau r15
Confidential, Edison, NJ
Big Data Analyst
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Worked in the Advanced Operational Analytics and Big Data Analysis team.
- Designed business layer, database layer, and implemented transaction management into the existing architecture.
- Worked in Agile environment and participated in daily Stand-ups/Scrum Meetings.
- Worked on NOSQL databases such as MongoDB, HBase and Cassandra to enhance scalability and performance.
- Created Load Balancer on AWS EC2 for stable cluster and services which provide fast and effective processing of Data.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Used AWS Lambda to perform data validation, filtering, sorting or other transformations for every data change in HBase table and load the transformed data to another data store.
- Integrated Hadoop frameworks/technologies such as Hive and HBase to further operational and analytical experience.
- Loaded data from different servers to S3 bucket and setting appropriate bucket permissions.
- Created Hive queries for supporting the existing application.
- Wrote the HiveQL and manage Hive Meta store server to control different advanced activities.
- Worked with statistical analysis patterns and create the dashboards for quick references and share to the internal customers on daily, weekly or monthly basis.
- Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Worked with streaming and Data ware housing projects.
- Installed and configured Hive and written Hive UDFs.
- Worked in Json scripts, mongo dB and Unix environment to non-Sql data clean-up grouping and create the analysis reports.
- Wrote python scripts and java coding for business applications and MapReduce programs.
- Worked with hive warehouse directory and hive tables and services.
- Performed data cleaning and data preparation tasks to convert data into a meaningful data set using R
- Analyzed large data sets (structured and unstructured) using Hive queries, R Programming & Pig Scripts.
- Used Spark shell for interactive data analysis and process using Spark Sql to query structured data.
- Created Stored Procedures to communicate with SQL database.
- Involved in writing complex SQL Queries and provided SQL Scripts for the Configuration Data which is used by the application.
- Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
Environment: NOSQL, MongoDB 3.6, HBase 1.2, Cassandra, AWS, EC2, Agile, Amazon Redshift, Hadoop frameworks, S3, UDFs, Json, Scripts, UNIX, MapReduce, Python, R, Tableau
Confidential, Brentwood, TN
Data Modeler/Data Architect
- Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
- Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
- Familiarity with a NoSQL database such as MongoDB.
- Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin 9.5
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Used SDLC Methodology of Data Warehouse development using Kanbanize.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
- Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
- Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Developed Master data management strategies for storing reference data.
- Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
- Involved in Testing like Unit testing, System integration and regression testing.
- Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
- Perform reverse engineering of the dashboard requirements to model the required data marts.
- Developed Source to Target Matrix with ETL transformation logic for ETL team.
- Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and Confidential -SQL
- Created Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Handled performance requirements for databases in OLTP and OLAP models.
- Conducted meetings with business and development teams for data validation and end-to-end data mapping.
- Responsible for Metadata Management, keeping up to date centralized metadata repositories using Erwin modeling tools.
- Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
- Lead data migration from legacy systems into modern data integration frameworks from conception to completion.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server 2014 database systems..
- Managed the meta-data for the Subject Area models for the Data Warehouse environment.
- Generated DDL and created the tables and views in the corresponding architectural layers.
- Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
- Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract the data from SQL Database.
- Participate in code/design reviews and provide input into best practices for reports and universe development.
- Involved in Netezza Administration Activities like backup/restore, performance tuning, and Security configuration
- Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
- Created a high-level industry standard, generalized data model to convert it into logical and physical model Confidential later stages of the project using Erwin and Visio
- Participated in Performance Tuning using Explain Plan and TKPROF.
- Involved in translating business needs into long-term architecture solutions and reviewing object models, data models and metadata.
Environment: Erwin 9.5, HDFS, HBase, Hadoop, Metadata, MS Visio, SQL Server 2014, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.
Confidential, Plano, TX
Data Modeler/ Data Analyst
- Created Physical Data Analyst from the Logical Data Analyst using Compare and Merge Utility in ER Studio and worked with the naming standards utility.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models
- Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL*Loader and Handled Exceptions.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Utilized SDLC and Agile methodologies such as SCRUM.
- Involved in administrative tasks, including creation of database objects such as database, tables, and views, using SQL, DDL, and DML requests.
- Worked on Data Analysis, Data profiling, and Data Modeling, data governance identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Loaded multi format data from various sources like flat-file, Excel, MS Access and performing file system operation.
- Used Confidential -SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data marts.
- Worked on Physical design for both SMP and MPP RDBMS, with understanding of RDMBS scaling features.
- Wrote SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
- Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
- Performed ETL SQL optimization designed OLTP system environment and maintained documentation of Metadata.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Worked with developers on data Normalization and De-normalization, performance tuning issues, and provided assistance in stored procedures as needed.
- Used Teradata for OLTP systems by generating models to support Revenue Management Applications that connect to SAS.
- Created SSIS Packages for import and export of data between Oracle database and others like MS Excel and Flat Files.
- Worked in the capacity of ETL Developer (Oracle Data Integrator (ODI) / PL/SQL) to migrate data from different sources in to target Oracle Data Warehouse.
- Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
- Involved in creating tasks to pull and push data from Salesforce to Oracle Staging/Data Mart.
- Created VBA Macros to convert the Excel Input files in to correct format and loaded them to SQL Server.
- Helped the BI, ETL Developers in understanding the Data Model, data flow and the expected output for each model created
Environment: ER/Studio 8.0, Oracle 10g Application Server, Oracle Developer Suite, PL/SQL, Confidential -SQL, SQL plus, SSIS, Teradata 13, OLAP, OLTP, SAS, MS Excel.
Confidential, Columbus, Ohio
- Worked with clients and business analysts to identify service requirements and figuring out what data they need to perform functions.
- Interfacing with the data architecture team designing logical and physical models.
- Generating DDL,s, data mapping documents for hadoop projects.
- Reviewed business object Model, logical data model, physical data model and existing services.
- Identified different services, entities, and attributes required to satisfy data requests.
- Performed Mapping of required elements to a simplified view for service.
- Documented service request and response in terms of governing models.
- Worked closely with development team in loading data into CDS chase data services and doing the PI classification.
- Performed data reviews as a part of the Data Consistency Review Board (DCRB).
- Created and updated other CCB artifacts like mapping documents.
- Resolved discrepancies in terms of duplications, inconsistency and quality by correcting with appropriate naming standards (class words).
- Designed and provided grants to the DBA’s for the database environments.
- Created views, data partitioning and also creating staging tables for performance tuning and optimization Helped in data governance projects.
- Maintaining the master PI classification document for the team to make the PI classification for all the SOR’s available in one go.