Sr. Big Data Engineer Resume
West Point, PA
SUMMARY
- Over 6+ years of experience as Big Data Engineer /Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift
- Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
- Managed ELDM Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for integrated model.
- Expertise in moving structured schema data between Pig and Hive using HCatalog.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Solid knowledge of Data Marts, Operational Data Store (ODS),OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling forFACT and Dimensions Tables) using Analysis Services.
- Good understanding and exposure to Python programming.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco-system components MapReduce, Pig, Hive, HBase, Flume, Sqoop, and Oozie.
- Experience in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
- Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop.
- Good knowledge on NoSQL Databases including HBase, MongoDB, MapR-DB.
- Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
- Familiar with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
- Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
- Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
- Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
- Proficient knowledge and hands on experience in writing shell scripts in Linux.
TECHNICAL SKILLS
Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17
Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
OLAP Tools: Tableau 7, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, SAS BI Platform.
Cloud Platforms: AWS, EC2, EC3, Redshift & MS Azure
BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R
Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
PROFESSIONAL EXPERIENCE
Confidential - West Point, PA
Sr. Big Data Engineer
Responsibilities:
- As a Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
- Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
- Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
- Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
- Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
- Used data integration to manage data with speed and scalability using the Apache Spark engine in Azure Databricks.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
- Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Performed Data transformations in Hive and used partitions, buckets for performance improvements.
- Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
- Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
- Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
- Developed scripts in Pig for transforming data and extensively used event joins, filtered and done pre- aggregations.
- Performed Data scrubbing and processing with Apache NiFi and for workflow automation and coordination.
- Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
- Developed Simple to complex streaming jobs using Python, Hive and Pig.
- Optimized Hive queries to extract the customer information from HDFS.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Built Azure Data Warehouse Table Data sets for Power BI Reports.
- Working on BI reporting with At Scale OLAP for Big Data.
- Developed customized classes for serialization and De-serialization in Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hive 2.3, Pig 0.17, Python, HDFS, Hadoop 3.0, Azure, NOSQL, Sqoop 1.4, Oozie, Power BI, Agile, OLAP.
Confidential - Peoria IL
Big Data Engineer
Responsibilities:
- Participated in requirements sessions to gather requirements along with business analysts and product owners.
- Involved in Agile development methodology active member in scrum meetings.
- Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Installed and configured Hadoop Ecosystem components.
- Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Involved in Kafka and building use case relevant to our environment.
- Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Developed Spark code using Scala for faster testing and processing of data.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
- Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
- Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
- Involved in loading data from Unix file system to HDFS.
Environment: Spark, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, Hadoop 3.0, YARN, HDFS, HBase 1.2, Kafka, Scala 2.12, NoSQL, Cassandra 3.11, Elastic Search, Sqoop, UNIX, Zookeeper 3.4
Confidential - Rensselaer, NY
Data Engineer
Responsibilities:
- Worked as Data Engineer to review business requirement and compose source to target data mapping documents.
- Participated in requirements sessions to gather requirements along with business analysts and product owners.
- Involved in Agile development methodology active member in scrum meetings.
- Involved in Data Profiling and merge data from multiple data sources.
- Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.
- Designed 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas
- Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka.
- Developed data warehouse model in Snowflake for over 100 datasets.
- Implemented a fully operational production grade large scale data solution on Snowflake Data Warehouse.
- Worked with structured/semi-structured data ingestion and processing on AWS using S3, Python and Migrate on-premises big data workloads to AWS.
- Responsible for design and development of advanced Python programs to prepare transform and harmonize data sets in preparation for modeling.
- Identified target groups by conducting Segmentation analysis using Clustering techniques like K-means.
- Wrote Python scripts to parse XML documents and load the data in database.
- Used Python to extract weekly information from XML files.
- Developed Python scripts to clean the raw data.
- Worked on QA the data and adding Data sources, snapshot, caching to the report
- Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.
- Involved in SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on basis of using defect reports.
- Involved in preparing SQL and PL/SQL coding convention and standards.
- Involved in preparing functional specifications, technical documentation, schema documents, flow charts and user support documents.
- Involved in Data mapping specifications to create and execute detailed system test plans.
Technologies: Agile, ODS, OLTP, ETL, HDFS, Kafka, AWS, S3, Python, K-means, XML, SQL
Confidential - Washington, DC
Sr. Data Analyst
Responsibilities:
- Worked with the business analysts to understand the project specification and helped them to complete the specification.
- Gathered and documented the Audit trail and traceability of extracted information for data quality.
- Worked in Data Analysis, data profiling and data governance identifying Data Sets, Source Data, Source Metadata, Data Definitions and Data Formats.
- Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
- Used MS Access, MS Excel, Pivot tables and charts, MS PowerPoint, MS Outlook, MS Communicator and User Base to perform responsibilities.
- Extensively experience in Relational and physical data modeling for creating logical and physical design of Database and ER Diagrams.
- Extracted Data using SSIS from DB2, XML, Oracle, Excel and flat files perform transformations and populate the data warehouse
- Performed Teradata, SQL Queries, creating Tables, and Views by following Teradata Best Practices.
- Prepared Business Requirement Documentation and Functional Documentation.
- Primarily responsible for coordinating between project sponsor and stake holders.
- Conducted JAD sessions to allow different stakeholders such as editorials, designers, etc.,
- Performed Business Process mapping for new requirements.
- Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas
- Used SQL, PL/SQL to validate the Data going in to the Data warehouse
- Wrote complex SQL, PL/SQL testing scripts for Backend Testing of the data warehouse application. Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
- Used TOAD Software for Querying Oracle and Used WinSql for Querying DB2.
- Extensively tested the Business Objects report by running the SQL queries on the database by reviewing the report requirement documentation.
- Implemented the Data Cleansing using various transformations.
- Used Data Stage Director for running and monitoring performance statistics.
- Reverse Engineered the existing ODS into Erwin.
- Created reports to retrieve data using Stored Procedures.
- Designed and implemented basic SQL queries for testing and report/data validation.
- Ensured the compliance of the extracts to the Data Quality Center initiatives.
Environment: MS Access, MS Excel, Pivot tables, E/R Diagrams, SSIS, DB2, XML, Oracle, flat files, Excel, Teradata, SQL, PL/SQL, TOAD
Confidential - Bloomington, IL
Data Analyst
Responsibilities:
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
- Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
- Extracted data from different sources performing Data Integrity and quality checks.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint