We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

5.00/5 (Submit Your Rating)

West Point, PA

SUMMARY

  • Over 6+ years of experience as Big Data Engineer /Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Managed ELDM Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for integrated model.
  • Expertise in moving structured schema data between Pig and Hive using HCatalog.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Solid knowledge of Data Marts, Operational Data Store (ODS),OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling forFACT and Dimensions Tables) using Analysis Services.
  • Good understanding and exposure to Python programming.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco-system components MapReduce, Pig, Hive, HBase, Flume, Sqoop, and Oozie.
  • Experience in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
  • Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop.
  • Good knowledge on NoSQL Databases including HBase, MongoDB, MapR-DB.
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Familiar with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
  • Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
  • Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
  • Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.

TECHNICAL SKILLS

Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17

Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

OLAP Tools: Tableau 7, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, SAS BI Platform.

Cloud Platforms: AWS, EC2, EC3, Redshift & MS Azure

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R

Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - West Point, PA

Sr. Big Data Engineer

Responsibilities:

  • As a Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
  • Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Used data integration to manage data with speed and scalability using the Apache Spark engine in Azure Databricks.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Performed Data transformations in Hive and used partitions, buckets for performance improvements.
  • Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
  • Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
  • Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
  • Extensively involved in writing PL/SQL, stored procedures, functions and packages.
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Developed scripts in Pig for transforming data and extensively used event joins, filtered and done pre- aggregations.
  • Performed Data scrubbing and processing with Apache NiFi and for workflow automation and coordination.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Developed Simple to complex streaming jobs using Python, Hive and Pig.
  • Optimized Hive queries to extract the customer information from HDFS.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
  • Built Azure Data Warehouse Table Data sets for Power BI Reports.
  • Working on BI reporting with At Scale OLAP for Big Data.
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hive 2.3, Pig 0.17, Python, HDFS, Hadoop 3.0, Azure, NOSQL, Sqoop 1.4, Oozie, Power BI, Agile, OLAP.

Confidential - Peoria IL

Big Data Engineer

Responsibilities:

  • Participated in requirements sessions to gather requirements along with business analysts and product owners.
  • Involved in Agile development methodology active member in scrum meetings.
  • Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
  • Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Installed and configured Hadoop Ecosystem components.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Involved in Kafka and building use case relevant to our environment.
  • Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Developed Spark code using Scala for faster testing and processing of data.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
  • Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
  • Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
  • Involved in loading data from Unix file system to HDFS.

Environment: Spark, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, Hadoop 3.0, YARN, HDFS, HBase 1.2, Kafka, Scala 2.12, NoSQL, Cassandra 3.11, Elastic Search, Sqoop, UNIX, Zookeeper 3.4

Confidential - Rensselaer, NY

Data Engineer

Responsibilities:

  • Worked as Data Engineer to review business requirement and compose source to target data mapping documents.
  • Participated in requirements sessions to gather requirements along with business analysts and product owners.
  • Involved in Agile development methodology active member in scrum meetings.
  • Involved in Data Profiling and merge data from multiple data sources.
  • Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.
  • Designed 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas
  • Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka.
  • Developed data warehouse model in Snowflake for over 100 datasets.
  • Implemented a fully operational production grade large scale data solution on Snowflake Data Warehouse.
  • Worked with structured/semi-structured data ingestion and processing on AWS using S3, Python and Migrate on-premises big data workloads to AWS.
  • Responsible for design and development of advanced Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Identified target groups by conducting Segmentation analysis using Clustering techniques like K-means.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Used Python to extract weekly information from XML files.
  • Developed Python scripts to clean the raw data.
  • Worked on QA the data and adding Data sources, snapshot, caching to the report
  • Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.
  • Involved in SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on basis of using defect reports.
  • Involved in preparing SQL and PL/SQL coding convention and standards.
  • Involved in preparing functional specifications, technical documentation, schema documents, flow charts and user support documents.
  • Involved in Data mapping specifications to create and execute detailed system test plans.

Technologies: Agile, ODS, OLTP, ETL, HDFS, Kafka, AWS, S3, Python, K-means, XML, SQL

Confidential - Washington, DC

Sr. Data Analyst

Responsibilities:

  • Worked with the business analysts to understand the project specification and helped them to complete the specification.
  • Gathered and documented the Audit trail and traceability of extracted information for data quality.
  • Worked in Data Analysis, data profiling and data governance identifying Data Sets, Source Data, Source Metadata, Data Definitions and Data Formats.
  • Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
  • Used MS Access, MS Excel, Pivot tables and charts, MS PowerPoint, MS Outlook, MS Communicator and User Base to perform responsibilities.
  • Extensively experience in Relational and physical data modeling for creating logical and physical design of Database and ER Diagrams.
  • Extracted Data using SSIS from DB2, XML, Oracle, Excel and flat files perform transformations and populate the data warehouse
  • Performed Teradata, SQL Queries, creating Tables, and Views by following Teradata Best Practices.
  • Prepared Business Requirement Documentation and Functional Documentation.
  • Primarily responsible for coordinating between project sponsor and stake holders.
  • Conducted JAD sessions to allow different stakeholders such as editorials, designers, etc.,
  • Performed Business Process mapping for new requirements.
  • Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas
  • Used SQL, PL/SQL to validate the Data going in to the Data warehouse
  • Wrote complex SQL, PL/SQL testing scripts for Backend Testing of the data warehouse application. Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
  • Used TOAD Software for Querying Oracle and Used WinSql for Querying DB2.
  • Extensively tested the Business Objects report by running the SQL queries on the database by reviewing the report requirement documentation.
  • Implemented the Data Cleansing using various transformations.
  • Used Data Stage Director for running and monitoring performance statistics.
  • Reverse Engineered the existing ODS into Erwin.
  • Created reports to retrieve data using Stored Procedures.
  • Designed and implemented basic SQL queries for testing and report/data validation.
  • Ensured the compliance of the extracts to the Data Quality Center initiatives.

Environment: MS Access, MS Excel, Pivot tables, E/R Diagrams, SSIS, DB2, XML, Oracle, flat files, Excel, Teradata, SQL, PL/SQL, TOAD

Confidential - Bloomington, IL

Data Analyst

Responsibilities:

  • Worked with Data Analysts to understand Business logic and User Requirements.
  • Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
  • Created reports for the Data Analysis using SQL Server Reporting Services.
  • Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
  • Created SQL queries to simplify migration progress reports and analyses.
  • Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
  • Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
  • Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
  • Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
  • Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
  • Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
  • Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
  • Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
  • With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
  • Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
  • Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
  • Extracted data from different sources performing Data Integrity and quality checks.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Collected, analyze and interpret complex data for reporting and/or performance trend analysis
  • Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.

Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint

We'd love your feedback!