- Above 8+ years of IT experience as Big Data Engineer/Data Engineer, designing and data analysis with Big Data/Hadoop professional with applied information Technology.
- Experience in Data modeling, complex data structures, Data processing, Data quality, Data lifecycle.
- Experience in Amazon AWS cloud which includes services like: EC2, S3, EBS, ELB, AMI, IAM, Route53, Autoscaling, CloudFront, CloudWatch, Security Groups.
- A very good understanding of job workflow scheduling and monitoring tools like Oozie and ControlM.
- Experience in metadata design, real time BI Architecture including Data Governance for greater ROI.
- Experienced in designing Architecture for Modeling a Data Warehouse by using tools like Erwin r9.6/r9.5, Sybase Power Designer and E - R Studio.
- Proficient in System Analysis, ER/Dimensional Data Modeling, Database design and implementing RDBMS specific features.
- Well versed with Data Migration, Data Conversions, Data Extraction/ Transformation/Loading (ETL)
- Experience with Object Oriented Analysis and Design (OOAD) using UML, Rational Unified Process (RUP), Rational Rose and MS Visio.
- Experienced in Developing Triggers, Batch Apex, and Scheduled Apex classes.
- Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Solr and Kafka.
- Defined real time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, Nifi and Flume.
- Excellent experience in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Experience in Teradata RDBMS using Fast load, Fast Export, Multi load, T pump, and Teradata SQL Assistance and BTEQ Teradata utilities.
- Experienced in Data Modeling including Data Validation/Scrubbing and Operational assumptions.
- Very good knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and Identifying Data Mismatch.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Apache Storm.
- Experience in working with MapReduce programs using Apache Hadoop for working with BigData.
- Strong experience working with conceptual, logical and physical data modeling considering Metadata standards.
- Experience working with Agile and Waterfall data modeling methodologies.
- Experience in Ralph Kimball and Bill Inmon approaches.
- Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNodes and MapReduce concepts.
- Strong knowledge in working with UNIX/LINUX environments, writing shell scripts and PL/SQL Stored Procedures.
- Implemented POC to migrate MapReduce jobs into Spark RDD transformations using Scala.
- Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Hands on developing and debugging YARN (MR2) Jobs to process large Datasets.
- Data Processing: Processed data using MapReduce and Yarn. Worked on Kafka as a proof of concept for log processing
- Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
- Experienced in importing and exporting data from RDBMS into HDFS using Sqoop.
- Hands on experience in working with database like Oracle, MySQL and PL/SQL.
- Experienced in developing Web Services with Python programming language.
- Experience in Performance Tuning, Optimization and Customization.
Big Data Eco-System: Hadoop3.0, HDFS, MapReduce, Hive 2.3, Pig, Hbase 1.2, Spark 2.2, Spark Streaming, Spark SQL, Kafka, Cloudera CDH4, CDH5, Hortonworks, Hadoop Streaming, Splunk, Zookeeper 3.4, Oozie, Sqoop, Flume 1.8.
Cloud Management: EC2, S3 Bucket, AMI, RDS, Redshift, Azure, Azure Data Factory, Azure Data Lake
Data Modeling Tools: ER/Studio V17, Erwin 9.6/9.5, Power Sybase Designer.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe
Operating System: Windows, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend and Tableau.
Languages: SQL, Shell Scripting, C/C++, Python 3.6, R, Scala
Operating system: Windows, Macintosh, Linux and Unix
DBMS / RDBMS: Oracle12c, SQL Server 2016/2014, DB2, Teradata 15/14
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
Confidential, Chicago, IL
Sr. Big data Engineer
- Extensively involved in Design phase and delivered Design documents in Hadoop eco system with HDFS, Hive, Pig, Sqoop and Spark with Scala.
- Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Kafka.
- Involved in the high-level design of the Hadoop architecture for the existing data structure and Business process
- Worked with clients to better understand their reporting and dash boarding needs and present solutions using structured Agile project methodology approach.
- Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP.
- Involved in loading disparate datasets into Hadoop Data Lake, this would be available to the data science team to predict the future.
- Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management (MDM).
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion.
- Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Pulled the data from Amazon S3 bucket to Data Lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
- Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for use case.
- Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
- In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
- Worded with commercial distribution of Hadoop including Hortonworks production HDP, Cloudera CDH and AWS (EMR, S3, and EC2).
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in loading data from UNIX file system to HDFS using Flume and HDFS API.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
- Participated in design reviews, code reviews, unit testing and integration testing.
- Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to HBase.
- Exported the analyzed data to the NoSQL Database using HBase for visualization and to generate reports for the Business Intelligence team using SAS.
- Created Hive tables as per requirement as internal or external tables, intended for efficiency.
- Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
- Worked with Elastic MapReduce (EMR) and setting up environments on Amazon AWS EC2 instances.
- Used JIRA for bug tracking and GIT for version control.
Environment: Hadoop 3.0, HDFS, hive 2.3, Pig, Sqoop, Spark 2.2, Scala, HBase 1.2, Kafka, Elastic Search, MapReduce, MLlib, Flume 1.8, Python, AWS, Web Services, GIT, JIRA, MDM
Confidential, Arlington, VA
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters with agile methodology.
- Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like Hbase and Sqoop.
- Developed MapReduce programs to perform data filtering for unstructured data.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Impala.
- Successfully loaded files to hive and HDFS from HBase.
- Worked on Classic and Yarn distributions of Hadoop like the Apache Hadoop, ClouderaCDH4 and CDH5.
- Created and altered HBase tables on top of data residing in Data Lake.
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive.
- Created and manage S3 buckets and policies for storage and backup purposes.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using Flume and Sqoop.
- Performed structural modifications using MapReduce, Hive and analyze data using visualization/reporting tools.
- Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in Amazon S3 buckets.
- Installed and configure Zookeeper service for coordinating configuration-related information of all the nodes in the cluster to manage it efficiently.
- Involved in converting HBase /Hive/SQL queries into Spark transformations using Spark RDD's in Scala and Python.
- Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers using tools like SQL Profiler and Database Tuning Advisor (DTA).
- Worked with multiple teams and understanding their business requirements for understanding data in the source files.
- Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Explored with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
Environment: Hadoop 3.0, HBase 1.2, Sqoop, MapReduce, Pig, Hive 2.3, Impala, HDFS, Pig, Zookeeper, SQL queries, Spark, Scala, Python, YARN
Confidential, St. Louis, MO
Sr. Data Architect/Data Modeler
- As a Sr. Data Architect/Modeler collaboratively worked with the Data modeling architects and other data modelers in the team to design the Enterprise Level Standard Data model.
- Interacted with users for verifying User Requirements, managing Change Control Process, updating existing Documentation.
- Working with the architecture and development teams to help choose data-related technologies, design architectures, and model data in a manner that is efficient, scalable, and supportable.
- Worked closely with the development and database administrators to guide the development of the physical data model and database design.
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Worked on designing Conceptual, Logical and Physical data models and performed data design reviews with the Project team members.
- Designed a STAR schema for sales data involving shared dimensions (Conformed) using Erwin Data Modeler.
- Worked on building the Logical data model from the scratch from the XMLs as the data source.
- Worked on building the data models to convert the data from one data Application to another in a way that suit the needs of the target database.
- Involved in versioning and saving the models to the data mart and maintaining the Data mart Repository.
- Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns.
- Built Data Lake in Azure using Hadoop (HDInsight clusters) and migrated Data using Azure Data Factory pipeline.
- Designed Lambda architecture to process streaming data using Spark. Data was ingested using Sqoop for structured data and Kafka for unstructured data.
- Creation Azure Event Hubs, Azure Service Bus, Azure Service Analysis, Power BI for handling IOT Messages.
- Ensure the data warehouse and data mart designs to efficiently support the reporting and BI team requirements.
- Performed Hive programming for applications that were migrated to big data using Hadoop.
- Involved in creating Hive tables and loading and analyzing data using hive queries Developed Hive.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Produced 3NF data models for OLTP designs using data modeling best practices and modeling skills.
- Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
- Worked with reversed engineer Data Model from Database instance and Scripts.
- Created data models for different databases like Oracle, Sql Server.
- Responsible for defining the naming standards for data warehouse.
- Enforced Referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
- Created Source to Target Mapping Documents to help guide the data model design from the Data source to the data model.
- Involved in the validation of the OLAP Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Data Governance of RAW, Staging, Curated and Presentation Layers in Azure DataLake Store.
- Involved in writing T-SQL, working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
- Involved in Data loading using PL/ SQL Scripts and SQL Server Integration Services (SSIS).
- Conducted and participated in JAD sessions with the users, modelers, and developers for resolving issues.
- Applied data naming standards, created the data dictionary and documented data model translation decisions and also maintained DW metadata.
- Created data masking mappings to mask the sensitive data between production and test environment.
- Participated in Performance Tuning using Explain Plan and TKPROF.
- Performance tuning and stress-testing of NoSQL database environments in order to ensure acceptable database performance in production mode.
Environment: Erwin9.7, Hadoop3.0, NoSQL, PL/Sql, T-Sql, SSIS, UNIX, Spark, Azure Data lake, OLTP,Azure SQL DB and Azure SQL DW.
Confidential, Raleigh, NC
Sr. Data Analyst/Data Modeler
- As a Data Analyst/Modeler responsible for Conceptual, Logical and Physical model for Supply Chain Project.
- Participated in JAD sessions involving the discussion of various reporting needs.
- Analyzed conceptual into logical data and had JAD sessions and also communicated data related issues and standards.
- Interacted with the Subject Matter Experts (SME's) and Stakeholders to get a better understanding of client business processes and gather business requirements.
- Assisted in analysis and recommendations on which Reporting tools.
- Created Database Tables, Views, Indexes, Triggers and Sequences and developing the Database Structure.
- Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
- Generated reports using SQL Server Reporting Services from OLTP and OLAP data sources.
- Designed and Developed Use Cases, Activity Diagrams, and Sequence Diagrams using Unified Modeling Language (UML).
- Designed Star and Snowflake Data Models for Enterprise Data Warehouse using Power Designer.
- Developed, documented and maintained logical and physical data models for development projects.
- Identified the facts and dimensions and designed star schema model for generating reports.
- Documented Technical & Business User Requirements during requirements gathering sessions.
- Involved in modeling business processes through UML diagrams.
- Created entity process association matrices, functional decomposition diagrams and data flow diagrams from business requirements documents.
- Used Sybase Power Designer tool for relational database and dimensional data warehouse designs.
- Worked alongside the database team to generate the best Physical Model from the Logical Model using Power Designer.
- Developed Cleansing and data migration rules for the Integration Architecture (OLTP, ODS, DW).
- Developed data mapping documents between Legacy, Production, and User Interface Systems.
- Used Crystal Reports to generate Ad-Hoc Reports
Environment: SQL, PL/SQL, OLTP, OLAP, SQL Server 2012, Sybase Power Designer 16.5
- Analysis of functional and non-functional categorized data elements for Data Migration, data profiling and mapping from source to target data environment. Developed working documents to support findings and assign specific tasks.
- Participated in requirements session with IT Business Analysts, SME's and business users to understand and document the business requirements as well as the goals of the project.
- Used and supported database applications and tools for extraction, transformation and analysis of raw data
- Developed complex T-SQL code such as Stored Procedures, functions, triggers, Indexes, and views for the business application.
- Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments. (QA, Development and Production)
- Created SSIS Packages for migration of data to MS SQL Server database from other databases and source like Flat Files, MS Excel, Sybase, CSV files.
- Optimized stored procedures using temp tables and indexing strategies to increase speed and reduce runtime.
- Automated processes from MS Access and Excel and rewrote to SQL views and tables.
- Developed reports for users in different departments in the organization using SQL Server Reporting Services (SSRS).
- Designed report models based on user requirements and used report builder to generate the reports.
- Used tools (Excel and SQL) to analyze, query, sort and manipulate data according to defined business rules and procedures.
- Performed data mining on data using very complex SQL queries and discovered pattern.
- Extensively used MS Access to pull the data from various data bases and integrate the data.
- Developed SQL, BTEQ (Teradata) queries for Extracting data from production database and built data structures, reports.
- Performed in depth analysis in data & prepared weekly, biweekly, monthly reports by using SQL, SAS, Ms Excel, Ms Access, and UNIX.
Environment: T-SQL, SSIS, MS SQL, MS Excel, MS Access, SQL queries, BTEQ, UNIX