We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Arlington, VA

SUMMARY:

  • Overall 7+ years of working experience in IT as a Big Data Engineer, Data Engineer and Programmer Analyst.
  • Hands on experience in SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
  • Hands on experience in SQL queries and optimizing the queries in Oracle, SQL Server, DB2, and Netezza & Teradata.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) De - normalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
  • Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
  • Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, Pig, Hive, Flume, Sqoop).
  • Good experience in using SSRS and Cognos in creating and managing reports for an organization.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Expertise in Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
  • Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
  • Good understanding of Apache Spark High level architecture and performance tuning patterns.
  • Hands On experience on developing UDF, Data Frames and SQL Queries in Spark SQL.
  • Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place huge amount of data.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases.
  • Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Developed PL/SQL programs (Functions, Procedures, Packages and Triggers).
  • Good experience in using Sqoop for traditional RDBMS data pulls.

TECHNICAL SKILLS:

Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Hadoop 3.0, Cassandra 3.11

Data Modeling Tools: Erwin r9.7, ER Studio v16

BI Tools: Tableau 10, SAP Business Objects, Crystal Reports

Methodologies: Agile, SDLC, Ralph Kimball data warehousing methodology, Joint Application Development (JAD)

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

PROFESSIONAL EXPERIENCE:

Confidential - Arlington, VA

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer worked on Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, and Hive.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Created data integration and technical solutions for Azure Data Lake Analytics, Azure Data Lake Storage, Azure Data Factory, Azure SQL databases and Azure SQL Data Warehouse for providing analytics.
  • Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
  • Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
  • Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on Hive queries to categorize data of different wireless applications and security systems.
  • Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
  • Extensively involved in writing PL/SQL, stored procedures, functions and packages.
  • Involved in Data Architecture, Data profiling, Data analysis, data mapping and Data architecture artifacts design.
  • Created linked services to connect to Azure Storage, on-premises SQL Server and Azure HDInsight
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
  • Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
  • Configured Azure SQL database with Azure storage Explorer and with SQL server.
  • Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like Erwin.
  • Designed class and activity diagrams using Power Designer and UML tools like Visio.

Environment: Hadoop 3.0, SDLC, Azure, HBase 1.2, Pig 0.17, Sqoop 1.4, Zookeeper, Oozie 4.3, SQL, HDFS, Hive 2.3, PL/SQL, Erwin 9.8, Scala, Apache Nifi, ETL, Excel, Flume 1.8.

Confidential - Lowell, AR

Data Engineer

Responsibilities:

  • Worked as a Big Data implementation engineer within a team of professionals.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Created HBase tables to store various data formats of PII data coming from different portfolios
  • Worked on End to End Software Development Life Cycle process in Agile Environment using Scrum methodologies.
  • Used forward engineering to generate DDL from the Physical Data Model and handed it to the DBA.
  • Created external tables pointing to HBase to access table with huge number of columns.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Involved in writing complex SQL Queries and provided SQL Scripts for the Configuration Data which is used by the application.
  • Implemented enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (Cassandra)
  • Implemented Installed and configured of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Integrated NoSQL database like HBase with MapReduce to move bulk amount of data into HBase.
  • Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
  • Rendered and delivered reports in desired formats by using reporting tools such as Tableau.
  • Developed Pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Assisted in designing, development and architecture of Hadoop and HBase systems.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
  • Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Developed Spark streaming application to pull data from cloud to hive table.
  • Wrote SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
  • Involved in Manipulating, cleansing & processing data using Excel and SQL and responsible for loading, extracting and validation of client data.
  • Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
  • Developed Python scripts to automate and provide Control flow to Pig scripts.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.

Environment: Hadoop 3.0, Agile, HDFS, HBase 1.2, Scala, Cassandra 3.1, SQL, ETL, AWS, Sqoop 1.4, Hive 2.3, MapReduce Pig 0.17, Oracle 12c, Oozie 4.3, Tableau, OLAP, PL/SQL, Kafka 1.0.

Confidential - Houston, TX

Data Modeler

Responsibilities:

  • Understood and translate business needs into data models supporting underwriting workstation services.
  • Created DDL scripts using Erwin and source to target mappings to bring the data from source to the warehouse.
  • Developed dimensional model for Data Warehouse/OLAP applications by identifying required facts and dimensions.
  • Designed Star schema for the detailed data marts and plan data marts consisting of confirmed dimensions.
  • Developed logical data models and physical database design and generated database schemas using Erwin.
  • Reverse Engineered the existing Stored Procedures and wrote Mapping Documents for them.
  • Developed stored procedures and triggers, packages, functions and exceptions using PL/SQL
  • Designed both 3NF data models for OLTP systems and dimensional data model
  • Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports
  • Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
  • Validated existing Data Quality rules to ensure they meet Data Governance requirements.
  • Involved in writing queries and stored procedures using MySQL and SQL Server.
  • Created data masking mappings to mask the sensitive data between production and test environment.
  • Developed solutions for data quality issues and collaborate with the business and IT to implement those solutions.
  • Created SQL queries using TOAD and SQL Navigator and also created various databases object stored procedure, tables, views.
  • Created Data stage jobs (ETL Process) for populating the data into the Data warehouse constantly from different source systems.
  • Worked on Metadata exchange among various proprietary systems using XML.
  • Extracted data from Oracle and upload to Teradata tables using Teradata utilities FASTLOAD & Multiload.
  • Designed Data Flow Diagrams, E/R Diagrams and enforced all referential integrity constraints.
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
  • Used data vault modeling method which was adaptable to the needs of this project.
  • Created business requirement documents and integrated the requirements and underlying platform functionality.

Environment: Erwin 9.5, Teradata 14.0, Oracle 11g, SQL, PL/SQL, OLAP, OLTP, TOAD, ETL, XML, MySQL, Crystal reports 14.1x.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Worked as a Data Analyst/Modeler to generate Data Models and subsequent deployment to Enterprise Data Warehouse.
  • Conducted source data analysis of various data sources and develop source-to-target mappings with business rules.
  • Conducted data modeling JAD sessions and communicated data-related standards.
  • Generated DDL statements for the creation of new ER/studio objects like table, views, indexes, packages and stored procedures.
  • Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing
  • Done Reverse engineering on existing data model to understand the data flow and business flows.
  • Performed Data Profiling to identify data issues upfront, provided SQL prototypes to confirm the business logic provided prior to the development.
  • Designed the Data Model/Data exchange Metadata Model for All Interfaces and Data Exchanges
  • Developed Conceptual, Logical and Physical data models for central model consolidation.
  • Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio
  • Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management.
  • Developed and deployed quality T-SQL codes, stored procedures, views, functions, triggers and jobs.
  • Effectively used triggers and stored procedures necessary to meet specific application's requirements.
  • Designed and Maintained Data Model for OLTP systems and OLAP systems, ODS and Data Marts using 3NF and Dimensional Design
  • Created SQL scripts for database modification and performed multiple data modeling tasks at the same time under tight schedules.
  • Used the Data Stage Designer to design and develop jobs for extracting, cleansing, transforming, integrating, and loading data into different Data Marts.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2.
  • Performed analysis and presented results using SQL, SSIS, Excel, and Visual Basic scripts.

Environment: ER/Studio, Oracle 11g, SQL, PL/SQL, T-SQL, ODS, OLAP, OLTP, Business Objects, SSIS, MS Excel 2012

Hire Now