We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Boston, MA

SUMMARY:

  • Above 8+ years of experience as Big Data Engineer/ Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
  • Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
  • Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala and CHEF.
  • Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
  • Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
  • Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
  • Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
  • Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Performed the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and Data Stage.
  • Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
  • Experience in Data transformation, Data Mapping from source to target database schemas, Data Cleansing procedures.
  • Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
  • Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
  • Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
  • Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization.

TECHNICAL SKILLS:

Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL,Python, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Operating System: Windows 7/8/10, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, Talend, Tableau, and Pentaho.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

WORK EXPERIENCE:

Confidential - Boston, MA

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they related to the development of analytics.
  • Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Designed and developed software applications, testing, and building automation tools.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for use case.
  • In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
  • Analyzed Big Data Analytic technologies and applications in both business intelligence analyses.
  • Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards.
  • Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Active involvement in design, new development and SLA based support tickets of Big Machines applications.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Involved in working of big data analysis using Pig and User defined functions (UDF).
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Built Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco-system.
  • Built a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Involved in developing MapReduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Involved in migration of data from existing RDBMS (Oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Worked on Hive query to process key, value pairs and upload the data to NoSQL database HBase.

Environment: Hadoop 3.0, MapReduce, HBase, Hive 2.3, Informatica, HDFS, Scala 2.12, Spark, Sqoop 1.4, Apache Nifi, HDFS, AWS, EC2, SQL server, Oracle 12c, Pig 0.17

Confidential - Hartford, CT

Big Data Engineer

Responsibilities:

  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Worked with SME and conducted JAD sessions documented the requirements using UML and use case diagrams
  • Used SDLC Methodology of Data Warehouse development using Kanbanize.
  • Configured Apache Mahout Engine.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Worked in exporting data from Hive tables into Netezza database.
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Built Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark 2.3, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.7, Yarn, HBase, PL/SQL, MongoDB, Pig 0.17, Sqoop 1.4, Apache Flume 1.8

Confidential - Stamford, CT

Data Analyst/ Data Engineer

Responsibilities:

  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
  • Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
  • Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
  • Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
  • Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
  • Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
  • Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
  • Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
  • Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
  • Performed reverse engineering of the dashboard requirements to model the required data marts.
  • Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
  • Created Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
  • Handled performance requirements for databases in OLTP and OLAP models.
  • Conducted meetings with business and development teams for data validation and end-to-end data mapping.
  • Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
  • Lead data migration from legacy systems into modern data integration frameworks from conception to completion.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server database systems..
  • Generated DDL and created the tables and views in the corresponding architectural layers.
  • Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract the data from SQL Database.
  • Participate in code/design reviews and provide input into best practices for reports and universe development.
  • Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
  • Created a high-level industry standard, generalized data model to convert it into logical and physical model at later stages of the project using Erwin and Visio
  • Involved in translating business needs into long-term architecture solutions and reviewing object models, data models and metadata.

Environment: Erwin 9.7, HDFS, HBase, Hadoop 3.0, Metadata, MS Visio 2016, SQL Server 2016, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.

Confidential - Chicago, IL

Sr. Data Analyst / Data Modeler

Responsibilities:

  • As a Sr. Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
  • Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, Architect's and BAs.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Translated business and data requirements into data models in support of Enterprise Data Models, Data Warehouse and Analytical systems.
  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
  • Participated in JAD sessions involving the discussion of various reporting needs.
  • Reverse Engineering the existing data marts and identified the Data Elements, Dimensions, Facts and Measures required for reports.
  • Extensively used PL/SQL in writing database packages, stored procedures, functions and triggers in Oracle.
  • Created data dictionaries for various data models to help other teams understand the actual purpose of each table and its columns.
  • Developed the required data warehouse model using Star schema for the generalized model.
  • Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
  • Used forward engineering approach for designing and creating databases for OLAP model.
  • Developed and maintained Data Dictionary to create Metadata Reports for technical and business purpose.
  • Worked with BI team in providing SQL queries, Data Dictionaries and mapping documents
  • Responsible for the analysis of business requirements and design implementation of the business solution.
  • Extensively involved in Data Governance that involved data definition, data quality, rule definition, privacy and regulatory policies, auditing and access control.
  • Designed and Developed Oracle database Tables, Views, Indexes and maintained the databases by deleting and removing old data.
  • Developed Data mapping, Data Governance, Transformation and Cleansing rules for the Data Management involving OLTP, ODS and OLAP.
  • Conducting user interviews, gathering requirements, analyzing the requirements using Rational Rose, Requisite pro RUP
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Created E/R Diagrams, Data Flow Diagrams, grouped and created the tables, validated the data.
  • Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Translated business concepts into XML vocabularies by designing Schemas with UML.

Environment: MS Visio 2014, PL/SQL, Oracle 11g, OLAP, XML, OLTP, SQL server, Transact-SQL

Confidential

Data Analyst

Responsibilities:

  • Worked closely with various business teams in gathering the business requirements.
  • Used the MS Excel, MS Access for data pulls and ad-hoc reports for analysis.
  • Worked with business analyst to design weekly reports using combination of Crystal Reports.
  • Experienced in data cleansing and Data migration for accurate reporting
  • Worked extensively on SQL querying using Joins, Alias, Functions, Triggers and Indexes.
  • Managed all indexing, debugging and query optimization techniques for performance tuning using T-SQL.
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Performed data analysis, statistical analysis, generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Developed SQL Server database to replace existing Access databases.
  • Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Developed SQL scripts involving complex joins for reporting purposes.
  • Assisted with designing database packages and procedures.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization.
  • Data analysis and reporting using MS Power Point, MS Access and SQL assistant.
  • Worked on CSV files while trying to get input from the MySQL database.
  • Created functions, triggers, views and stored procedures using MySQL.
  • Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
  • Created pivot tables and charts using worksheet Data and external resources, modified pivot tables, sorted items and group Data, and refreshed and formatted pivot tables,
  • Worked in importing and cleansing of data from various sources like flat files, MS SQL Server with high volume data.
  • Worked and extracted data from various database sources like DB2, CSV, XML and Flat files into the Data Stage.
  • Developed ad hoc reports using Crystal reports for performance analysis by business users.

Environment: Crystal Reports, T-SQL, SAS, PL/SQL, DB2, SQL Server, MS Power Point 2010, MS Access 2010, SQL assistant, MySQL

Hire Now