Sr. Big Data Engineer Resume
Arlington, VA
SUMMARY:
- Overall 7+ years of working experience in IT as a Big Data Engineer, Data Engineer and Programmer Analyst.
- Hands on experience in SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
- Hands on experience in SQL queries and optimizing the queries in Oracle, SQL Server, DB2, and Netezza & Teradata.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) De - normalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, Pig, Hive, Flume, Sqoop).
- Good experience in using SSRS and Cognos in creating and managing reports for an organization.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Expertise in Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
- Good understanding of Apache Spark High level architecture and performance tuning patterns.
- Hands On experience on developing UDF, Data Frames and SQL Queries in Spark SQL.
- Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place huge amount of data.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases.
- Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Developed PL/SQL programs (Functions, Procedures, Packages and Triggers).
- Good experience in using Sqoop for traditional RDBMS data pulls.
TECHNICAL SKILLS:
Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Hadoop 3.0, Cassandra 3.11
Data Modeling Tools: Erwin r9.7, ER Studio v16
BI Tools: Tableau 10, SAP Business Objects, Crystal Reports
Methodologies: Agile, SDLC, Ralph Kimball data warehousing methodology, Joint Application Development (JAD)
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.
Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.
PROFESSIONAL EXPERIENCE:
Confidential - Arlington, VA
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer worked on Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, and Hive.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Created data integration and technical solutions for Azure Data Lake Analytics, Azure Data Lake Storage, Azure Data Factory, Azure SQL databases and Azure SQL Data Warehouse for providing analytics.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on Hive queries to categorize data of different wireless applications and security systems.
- Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Involved in Data Architecture, Data profiling, Data analysis, data mapping and Data architecture artifacts design.
- Created linked services to connect to Azure Storage, on-premises SQL Server and Azure HDInsight
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Configured Azure SQL database with Azure storage Explorer and with SQL server.
- Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like Erwin.
- Designed class and activity diagrams using Power Designer and UML tools like Visio.
Environment: Hadoop 3.0, SDLC, Azure, HBase 1.2, Pig 0.17, Sqoop 1.4, Zookeeper, Oozie 4.3, SQL, HDFS, Hive 2.3, PL/SQL, Erwin 9.8, Scala, Apache Nifi, ETL, Excel, Flume 1.8.
Confidential - Lowell, AR
Data Engineer
Responsibilities:
- Worked as a Big Data implementation engineer within a team of professionals.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Created HBase tables to store various data formats of PII data coming from different portfolios
- Worked on End to End Software Development Life Cycle process in Agile Environment using Scrum methodologies.
- Used forward engineering to generate DDL from the Physical Data Model and handed it to the DBA.
- Created external tables pointing to HBase to access table with huge number of columns.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Involved in writing complex SQL Queries and provided SQL Scripts for the Configuration Data which is used by the application.
- Implemented enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (Cassandra)
- Implemented Installed and configured of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Integrated NoSQL database like HBase with MapReduce to move bulk amount of data into HBase.
- Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
- Rendered and delivered reports in desired formats by using reporting tools such as Tableau.
- Developed Pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Assisted in designing, development and architecture of Hadoop and HBase systems.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
- Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Developed Spark streaming application to pull data from cloud to hive table.
- Wrote SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
- Involved in Manipulating, cleansing & processing data using Excel and SQL and responsible for loading, extracting and validation of client data.
- Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
- Developed Python scripts to automate and provide Control flow to Pig scripts.
- Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
Environment: Hadoop 3.0, Agile, HDFS, HBase 1.2, Scala, Cassandra 3.1, SQL, ETL, AWS, Sqoop 1.4, Hive 2.3, MapReduce Pig 0.17, Oracle 12c, Oozie 4.3, Tableau, OLAP, PL/SQL, Kafka 1.0.
Confidential - Houston, TX
Data Modeler
Responsibilities:
- Understood and translate business needs into data models supporting underwriting workstation services.
- Created DDL scripts using Erwin and source to target mappings to bring the data from source to the warehouse.
- Developed dimensional model for Data Warehouse/OLAP applications by identifying required facts and dimensions.
- Designed Star schema for the detailed data marts and plan data marts consisting of confirmed dimensions.
- Developed logical data models and physical database design and generated database schemas using Erwin.
- Reverse Engineered the existing Stored Procedures and wrote Mapping Documents for them.
- Developed stored procedures and triggers, packages, functions and exceptions using PL/SQL
- Designed both 3NF data models for OLTP systems and dimensional data model
- Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports
- Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
- Validated existing Data Quality rules to ensure they meet Data Governance requirements.
- Involved in writing queries and stored procedures using MySQL and SQL Server.
- Created data masking mappings to mask the sensitive data between production and test environment.
- Developed solutions for data quality issues and collaborate with the business and IT to implement those solutions.
- Created SQL queries using TOAD and SQL Navigator and also created various databases object stored procedure, tables, views.
- Created Data stage jobs (ETL Process) for populating the data into the Data warehouse constantly from different source systems.
- Worked on Metadata exchange among various proprietary systems using XML.
- Extracted data from Oracle and upload to Teradata tables using Teradata utilities FASTLOAD & Multiload.
- Designed Data Flow Diagrams, E/R Diagrams and enforced all referential integrity constraints.
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
- Used data vault modeling method which was adaptable to the needs of this project.
- Created business requirement documents and integrated the requirements and underlying platform functionality.
Environment: Erwin 9.5, Teradata 14.0, Oracle 11g, SQL, PL/SQL, OLAP, OLTP, TOAD, ETL, XML, MySQL, Crystal reports 14.1x.
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Worked as a Data Analyst/Modeler to generate Data Models and subsequent deployment to Enterprise Data Warehouse.
- Conducted source data analysis of various data sources and develop source-to-target mappings with business rules.
- Conducted data modeling JAD sessions and communicated data-related standards.
- Generated DDL statements for the creation of new ER/studio objects like table, views, indexes, packages and stored procedures.
- Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing
- Done Reverse engineering on existing data model to understand the data flow and business flows.
- Performed Data Profiling to identify data issues upfront, provided SQL prototypes to confirm the business logic provided prior to the development.
- Designed the Data Model/Data exchange Metadata Model for All Interfaces and Data Exchanges
- Developed Conceptual, Logical and Physical data models for central model consolidation.
- Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
- Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio
- Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management.
- Developed and deployed quality T-SQL codes, stored procedures, views, functions, triggers and jobs.
- Effectively used triggers and stored procedures necessary to meet specific application's requirements.
- Designed and Maintained Data Model for OLTP systems and OLAP systems, ODS and Data Marts using 3NF and Dimensional Design
- Created SQL scripts for database modification and performed multiple data modeling tasks at the same time under tight schedules.
- Used the Data Stage Designer to design and develop jobs for extracting, cleansing, transforming, integrating, and loading data into different Data Marts.
- Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2.
- Performed analysis and presented results using SQL, SSIS, Excel, and Visual Basic scripts.
Environment: ER/Studio, Oracle 11g, SQL, PL/SQL, T-SQL, ODS, OLAP, OLTP, Business Objects, SSIS, MS Excel 2012