Sr. Big Data Engineer Resume West Point, PA - Hire IT People

SUMMARY

Over 6+ years of experience as Big Data Engineer /Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift
Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
Managed ELDM Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for integrated model.
Expertise in moving structured schema data between Pig and Hive using HCatalog.
Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
Solid knowledge of Data Marts, Operational Data Store (ODS),OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling forFACT and Dimensions Tables) using Analysis Services.
Good understanding and exposure to Python programming.
Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco-system components MapReduce, Pig, Hive, HBase, Flume, Sqoop, and Oozie.
Experience in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop.
Good knowledge on NoSQL Databases including HBase, MongoDB, MapR-DB.
Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
Familiar with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
Proficient knowledge and hands on experience in writing shell scripts in Linux.

TECHNICAL SKILLS

Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17

Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

OLAP Tools: Tableau 7, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, SAS BI Platform.

Cloud Platforms: AWS, EC2, EC3, Redshift & MS Azure

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R

Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - West Point, PA

Sr. Big Data Engineer

Responsibilities:

As a Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
Used data integration to manage data with speed and scalability using the Apache Spark engine in Azure Databricks.
Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Performed Data transformations in Hive and used partitions, buckets for performance improvements.
Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
Extensively involved in writing PL/SQL, stored procedures, functions and packages.
Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
Developed scripts in Pig for transforming data and extensively used event joins, filtered and done pre- aggregations.
Performed Data scrubbing and processing with Apache NiFi and for workflow automation and coordination.
Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
Developed Simple to complex streaming jobs using Python, Hive and Pig.
Optimized Hive queries to extract the customer information from HDFS.
Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
Built Azure Data Warehouse Table Data sets for Power BI Reports.
Working on BI reporting with At Scale OLAP for Big Data.
Developed customized classes for serialization and De-serialization in Hadoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hive 2.3, Pig 0.17, Python, HDFS, Hadoop 3.0, Azure, NOSQL, Sqoop 1.4, Oozie, Power BI, Agile, OLAP.

Confidential - Peoria IL

Big Data Engineer

Responsibilities:

Participated in requirements sessions to gather requirements along with business analysts and product owners.
Involved in Agile development methodology active member in scrum meetings.
Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
Architected, Designed and Developed Business applications and Data marts for reporting.
Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
Developed Big Data solutions focused on pattern matching and predictive modeling
Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
Installed and configured Hadoop Ecosystem components.
Worked on implementation and maintenance of Cloudera Hadoop cluster.
Created Hive External tables to stage data and then move the data from Staging to main tables
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
Involved in Kafka and building use case relevant to our environment.
Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
Developed Spark code using Scala for faster testing and processing of data.
Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
Involved in loading data from Unix file system to HDFS.

Environment: Spark, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, Hadoop 3.0, YARN, HDFS, HBase 1.2, Kafka, Scala 2.12, NoSQL, Cassandra 3.11, Elastic Search, Sqoop, UNIX, Zookeeper 3.4

Confidential - Rensselaer, NY

Data Engineer

Responsibilities:

Worked as Data Engineer to review business requirement and compose source to target data mapping documents.
Participated in requirements sessions to gather requirements along with business analysts and product owners.
Involved in Agile development methodology active member in scrum meetings.
Involved in Data Profiling and merge data from multiple data sources.
Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.
Designed 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas
Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka.
Developed data warehouse model in Snowflake for over 100 datasets.
Implemented a fully operational production grade large scale data solution on Snowflake Data Warehouse.
Worked with structured/semi-structured data ingestion and processing on AWS using S3, Python and Migrate on-premises big data workloads to AWS.
Responsible for design and development of advanced Python programs to prepare transform and harmonize data sets in preparation for modeling.
Identified target groups by conducting Segmentation analysis using Clustering techniques like K-means.
Wrote Python scripts to parse XML documents and load the data in database.
Used Python to extract weekly information from XML files.
Developed Python scripts to clean the raw data.
Worked on QA the data and adding Data sources, snapshot, caching to the report
Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.
Involved in SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on basis of using defect reports.
Involved in preparing SQL and PL/SQL coding convention and standards.
Involved in preparing functional specifications, technical documentation, schema documents, flow charts and user support documents.
Involved in Data mapping specifications to create and execute detailed system test plans.

Technologies: Agile, ODS, OLTP, ETL, HDFS, Kafka, AWS, S3, Python, K-means, XML, SQL

Confidential - Washington, DC

Sr. Data Analyst

Responsibilities:

Worked with the business analysts to understand the project specification and helped them to complete the specification.
Gathered and documented the Audit trail and traceability of extracted information for data quality.
Worked in Data Analysis, data profiling and data governance identifying Data Sets, Source Data, Source Metadata, Data Definitions and Data Formats.
Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
Used MS Access, MS Excel, Pivot tables and charts, MS PowerPoint, MS Outlook, MS Communicator and User Base to perform responsibilities.
Extensively experience in Relational and physical data modeling for creating logical and physical design of Database and ER Diagrams.
Extracted Data using SSIS from DB2, XML, Oracle, Excel and flat files perform transformations and populate the data warehouse
Performed Teradata, SQL Queries, creating Tables, and Views by following Teradata Best Practices.
Prepared Business Requirement Documentation and Functional Documentation.
Primarily responsible for coordinating between project sponsor and stake holders.
Conducted JAD sessions to allow different stakeholders such as editorials, designers, etc.,
Performed Business Process mapping for new requirements.
Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas
Used SQL, PL/SQL to validate the Data going in to the Data warehouse
Wrote complex SQL, PL/SQL testing scripts for Backend Testing of the data warehouse application. Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
Used TOAD Software for Querying Oracle and Used WinSql for Querying DB2.
Extensively tested the Business Objects report by running the SQL queries on the database by reviewing the report requirement documentation.
Implemented the Data Cleansing using various transformations.
Used Data Stage Director for running and monitoring performance statistics.
Reverse Engineered the existing ODS into Erwin.
Created reports to retrieve data using Stored Procedures.
Designed and implemented basic SQL queries for testing and report/data validation.
Ensured the compliance of the extracts to the Data Quality Center initiatives.

Environment: MS Access, MS Excel, Pivot tables, E/R Diagrams, SSIS, DB2, XML, Oracle, flat files, Excel, Teradata, SQL, PL/SQL, TOAD

Confidential - Bloomington, IL

Data Analyst

Responsibilities:

Worked with Data Analysts to understand Business logic and User Requirements.
Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
Created reports for the Data Analysis using SQL Server Reporting Services.
Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
Created SQL queries to simplify migration progress reports and analyses.
Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
Extracted data from different sources performing Data Integrity and quality checks.
Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
Collected, analyze and interpret complex data for reporting and/or performance trend analysis
Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.

Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

West Point, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship