We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Chicago, IL

PROFILE SUMMARY:

  • Above 8+ years of IT experience as Big Data Engineer/Data Engineer, designing and data analysis with Big Data/Hadoop professional with applied information Technology.
  • Experience in Data modeling, complex data structures, Data processing, Data quality, Data lifecycle.
  • Experience in Amazon AWS cloud which includes services like: EC2, S3, EBS, ELB, AMI, IAM, Route53, Autoscaling, CloudFront, CloudWatch, Security Groups.
  • A very good understanding of job workflow scheduling and monitoring tools like Oozie and ControlM.
  • Experience in metadata design, real time BI Architecture including Data Governance for greater ROI.
  • Experienced in designing Architecture for Modeling a Data Warehouse by using tools like Erwin r9.6/r9.5, Sybase Power Designer and E - R Studio.
  • Proficient in System Analysis, ER/Dimensional Data Modeling, Database design and implementing RDBMS specific features.
  • Well versed with Data Migration, Data Conversions, Data Extraction/ Transformation/Loading (ETL)
  • Experience with Object Oriented Analysis and Design (OOAD) using UML, Rational Unified Process (RUP), Rational Rose and MS Visio.
  • Experienced in Developing Triggers, Batch Apex, and Scheduled Apex classes.
  • Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Solr and Kafka.
  • Defined real time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, Nifi and Flume.
  • Excellent experience in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Experience in Teradata RDBMS using Fast load, Fast Export, Multi load, T pump, and Teradata SQL Assistance and BTEQ Teradata utilities.
  • Experienced in Data Modeling including Data Validation/Scrubbing and Operational assumptions.
  • Very good knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and Identifying Data Mismatch.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Apache Storm.
  • Experience in working with MapReduce programs using Apache Hadoop for working with BigData.
  • Strong experience working with conceptual, logical and physical data modeling considering Metadata standards.
  • Experience working with Agile and Waterfall data modeling methodologies.
  • Experience in Ralph Kimball and Bill Inmon approaches.
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNodes and MapReduce concepts.
  • Strong knowledge in working with UNIX/LINUX environments, writing shell scripts and PL/SQL Stored Procedures.
  • Implemented POC to migrate MapReduce jobs into Spark RDD transformations using Scala.
  • Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Hands on developing and debugging YARN (MR2) Jobs to process large Datasets.
  • Data Processing: Processed data using MapReduce and Yarn. Worked on Kafka as a proof of concept for log processing
  • Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
  • Experienced in importing and exporting data from RDBMS into HDFS using Sqoop.
  • Hands on experience in working with database like Oracle, MySQL and PL/SQL.
  • Experienced in developing Web Services with Python programming language.
  • Experience in Performance Tuning, Optimization and Customization.

SKILLS:

Big Data Eco-System: Hadoop3.0, HDFS, MapReduce, Hive 2.3, Pig, Hbase 1.2, Spark 2.2, Spark Streaming, Spark SQL, Kafka, Cloudera CDH4, CDH5, Hortonworks, Hadoop Streaming, Splunk, Zookeeper 3.4, Oozie, Sqoop, Flume 1.8.

Cloud Management: EC2, S3 Bucket, AMI, RDS, Redshift, Azure, Azure Data Factory, Azure Data Lake

Data Modeling Tools: ER/Studio V17, Erwin 9.6/9.5, Power Sybase Designer.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend and Tableau.

Languages: SQL, Shell Scripting, C/C++, Python 3.6, R, Scala

Operating system: Windows, Macintosh, Linux and Unix

DBMS / RDBMS: Oracle12c, SQL Server 2016/2014, DB2, Teradata 15/14

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

WORK EXPERIENCE:

Confidential, Chicago, IL

Sr. Big data Engineer

Responsibilities:

  • Extensively involved in Design phase and delivered Design documents in Hadoop eco system with HDFS, Hive, Pig, Sqoop and Spark with Scala.
  • Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Kafka.
  • Involved in the high-level design of the Hadoop architecture for the existing data structure and Business process
  • Worked with clients to better understand their reporting and dash boarding needs and present solutions using structured Agile project methodology approach.
  • Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP.
  • Involved in loading disparate datasets into Hadoop Data Lake, this would be available to the data science team to predict the future.
  • Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management (MDM).
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion.
  • Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Pulled the data from Amazon S3 bucket to Data Lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for use case.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
  • Worded with commercial distribution of Hadoop including Hortonworks production HDP, Cloudera CDH and AWS (EMR, S3, and EC2).
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in loading data from UNIX file system to HDFS using Flume and HDFS API.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Participated in design reviews, code reviews, unit testing and integration testing.
  • Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to HBase.
  • Exported the analyzed data to the NoSQL Database using HBase for visualization and to generate reports for the Business Intelligence team using SAS.
  • Created Hive tables as per requirement as internal or external tables, intended for efficiency.
  • Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Worked with Elastic MapReduce (EMR) and setting up environments on Amazon AWS EC2 instances.
  • Used JIRA for bug tracking and GIT for version control.

Environment: Hadoop 3.0, HDFS, hive 2.3, Pig, Sqoop, Spark 2.2, Scala, HBase 1.2, Kafka, Elastic Search, MapReduce, MLlib, Flume 1.8, Python, AWS, Web Services, GIT, JIRA, MDM

Confidential, Arlington, VA

Data Engineer

Responsibilities:

  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters with agile methodology.
  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like Hbase and Sqoop.
  • Developed MapReduce programs to perform data filtering for unstructured data.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Impala.
  • Successfully loaded files to hive and HDFS from HBase.
  • Worked on Classic and Yarn distributions of Hadoop like the Apache Hadoop, ClouderaCDH4 and CDH5.
  • Created and altered HBase tables on top of data residing in Data Lake.
  • Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive.
  • Created and manage S3 buckets and policies for storage and backup purposes.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using Flume and Sqoop.
  • Performed structural modifications using MapReduce, Hive and analyze data using visualization/reporting tools.
  • Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in Amazon S3 buckets.
  • Installed and configure Zookeeper service for coordinating configuration-related information of all the nodes in the cluster to manage it efficiently.
  • Involved in converting HBase /Hive/SQL queries into Spark transformations using Spark RDD's in Scala and Python.
  • Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers using tools like SQL Profiler and Database Tuning Advisor (DTA).
  • Worked with multiple teams and understanding their business requirements for understanding data in the source files.
  • Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Explored with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.

Environment: Hadoop 3.0, HBase 1.2, Sqoop, MapReduce, Pig, Hive 2.3, Impala, HDFS, Pig, Zookeeper, SQL queries, Spark, Scala, Python, YARN

Confidential, St. Louis, MO

Sr. Data Architect/Data Modeler

Responsibilities:

  • As a Sr. Data Architect/Modeler collaboratively worked with the Data modeling architects and other data modelers in the team to design the Enterprise Level Standard Data model.
  • Interacted with users for verifying User Requirements, managing Change Control Process, updating existing Documentation.
  • Working with the architecture and development teams to help choose data-related technologies, design architectures, and model data in a manner that is efficient, scalable, and supportable.
  • Worked closely with the development and database administrators to guide the development of the physical data model and database design.
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Worked on designing Conceptual, Logical and Physical data models and performed data design reviews with the Project team members.
  • Designed a STAR schema for sales data involving shared dimensions (Conformed) using Erwin Data Modeler.
  • Worked on building the Logical data model from the scratch from the XMLs as the data source.
  • Worked on building the data models to convert the data from one data Application to another in a way that suit the needs of the target database.
  • Involved in versioning and saving the models to the data mart and maintaining the Data mart Repository.
  • Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns.
  • Built Data Lake in Azure using Hadoop (HDInsight clusters) and migrated Data using Azure Data Factory pipeline.
  • Designed Lambda architecture to process streaming data using Spark. Data was ingested using Sqoop for structured data and Kafka for unstructured data.
  • Creation Azure Event Hubs, Azure Service Bus, Azure Service Analysis, Power BI for handling IOT Messages.
  • Ensure the data warehouse and data mart designs to efficiently support the reporting and BI team requirements.
  • Performed Hive programming for applications that were migrated to big data using Hadoop.
  • Involved in creating Hive tables and loading and analyzing data using hive queries Developed Hive.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Produced 3NF data models for OLTP designs using data modeling best practices and modeling skills.
  • Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
  • Worked with reversed engineer Data Model from Database instance and Scripts.
  • Created data models for different databases like Oracle, Sql Server.
  • Responsible for defining the naming standards for data warehouse.
  • Enforced Referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
  • Created Source to Target Mapping Documents to help guide the data model design from the Data source to the data model.
  • Involved in the validation of the OLAP Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Data Governance of RAW, Staging, Curated and Presentation Layers in Azure DataLake Store.
  • Involved in writing T-SQL, working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
  • Involved in Data loading using PL/ SQL Scripts and SQL Server Integration Services (SSIS).
  • Conducted and participated in JAD sessions with the users, modelers, and developers for resolving issues.
  • Applied data naming standards, created the data dictionary and documented data model translation decisions and also maintained DW metadata.
  • Created data masking mappings to mask the sensitive data between production and test environment.
  • Participated in Performance Tuning using Explain Plan and TKPROF.
  • Performance tuning and stress-testing of NoSQL database environments in order to ensure acceptable database performance in production mode.

Environment: Erwin9.7, Hadoop3.0, NoSQL, PL/Sql, T-Sql, SSIS, UNIX, Spark, Azure Data lake, OLTP,Azure SQL DB and Azure SQL DW.

Confidential, Raleigh, NC

Sr. Data Analyst/Data Modeler

Responsibilities:

  • As a Data Analyst/Modeler responsible for Conceptual, Logical and Physical model for Supply Chain Project.
  • Participated in JAD sessions involving the discussion of various reporting needs.
  • Analyzed conceptual into logical data and had JAD sessions and also communicated data related issues and standards.
  • Interacted with the Subject Matter Experts (SME's) and Stakeholders to get a better understanding of client business processes and gather business requirements.
  • Assisted in analysis and recommendations on which Reporting tools.
  • Created Database Tables, Views, Indexes, Triggers and Sequences and developing the Database Structure.
  • Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
  • Generated reports using SQL Server Reporting Services from OLTP and OLAP data sources.
  • Designed and Developed Use Cases, Activity Diagrams, and Sequence Diagrams using Unified Modeling Language (UML).
  • Designed Star and Snowflake Data Models for Enterprise Data Warehouse using Power Designer.
  • Developed, documented and maintained logical and physical data models for development projects.
  • Identified the facts and dimensions and designed star schema model for generating reports.
  • Documented Technical & Business User Requirements during requirements gathering sessions.
  • Involved in modeling business processes through UML diagrams.
  • Created entity process association matrices, functional decomposition diagrams and data flow diagrams from business requirements documents.
  • Used Sybase Power Designer tool for relational database and dimensional data warehouse designs.
  • Worked alongside the database team to generate the best Physical Model from the Logical Model using Power Designer.
  • Developed Cleansing and data migration rules for the Integration Architecture (OLTP, ODS, DW).
  • Developed data mapping documents between Legacy, Production, and User Interface Systems.
  • Used Crystal Reports to generate Ad-Hoc Reports

Environment: SQL, PL/SQL, OLTP, OLAP, SQL Server 2012, Sybase Power Designer 16.5

Confidential

Data Analyst

Responsibilities:

  • Analysis of functional and non-functional categorized data elements for Data Migration, data profiling and mapping from source to target data environment. Developed working documents to support findings and assign specific tasks.
  • Participated in requirements session with IT Business Analysts, SME's and business users to understand and document the business requirements as well as the goals of the project.
  • Used and supported database applications and tools for extraction, transformation and analysis of raw data
  • Developed complex T-SQL code such as Stored Procedures, functions, triggers, Indexes, and views for the business application.
  • Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments. (QA, Development and Production)
  • Created SSIS Packages for migration of data to MS SQL Server database from other databases and source like Flat Files, MS Excel, Sybase, CSV files.
  • Optimized stored procedures using temp tables and indexing strategies to increase speed and reduce runtime.
  • Automated processes from MS Access and Excel and rewrote to SQL views and tables.
  • Developed reports for users in different departments in the organization using SQL Server Reporting Services (SSRS).
  • Designed report models based on user requirements and used report builder to generate the reports.
  • Used tools (Excel and SQL) to analyze, query, sort and manipulate data according to defined business rules and procedures.
  • Performed data mining on data using very complex SQL queries and discovered pattern.
  • Extensively used MS Access to pull the data from various data bases and integrate the data.
  • Developed SQL, BTEQ (Teradata) queries for Extracting data from production database and built data structures, reports.
  • Performed in depth analysis in data & prepared weekly, biweekly, monthly reports by using SQL, SAS, Ms Excel, Ms Access, and UNIX.

Environment: T-SQL, SSIS, MS SQL, MS Excel, MS Access, SQL queries, BTEQ, UNIX

Hire Now