Sr. Big Data Engineer Resume
Philadelphia, PA
SUMMARY:
- Above 9+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop .
- Good knowledge on NoSQL Databases including HBase, MongoDB, MapR-DB.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
- Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra .
- Familiar with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2 instances, RDS and others.
- Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Expor t through the use of multiple ETL tools such as Informatica Power Centre .
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes , stored procedures and functions .
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop .
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau .
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera .
- Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco-system components MapReduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Ambari, Zoo Keeper and Oozie, Storm, Spark, Kafka.
- Experience in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
- Good experienced in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift
- Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
- Managed ELDM Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for integrated model.
- Expertise in moving structured schema data between Pig and Hive using HCatalog .
- Creating data models (ERD, logical) including robust data definitions, which may be entity-relationship-attribute models, star , and snowflake models
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP , Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Good understanding and exposure to Python programming.
- Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
- Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza .
- Proficient knowledge and hands on experience in writing shell scripts in Linux .
- Experience on developing MapReduce jobs for data cleaning and data manipulation as required for the business.
- Strong Knowledge of Data Warehouse Architecture and Star Schema, Snow flake Schema, FACT and Dimensional Tables.
- Experience in SQL and good knowledge in PL/SQL programming and developed Stored Procedures and Triggers and Data Stage, DB2, Unix, Cognos, MDM, Hadoop, Pig.
TECHNICAL SKILLS:
Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Data Modeling Tools: Erwin Data Modeler 9.7/9.6, Erwin Model Manager, ER Studio v17, and Power Designer.
Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, SAS BI Platform.
Cloud Platforms: AWS, EC2, EC3, Redshift & MS Azure
OLAP Tools: Tableau 7, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17
BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
PROFESSIONAL EXPERIENCE:
Confidential - Philadelphia, PA
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer , you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
- Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
- Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
- Worked on creating documents in Mongo database.
- Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
- Designed and developed software applications, testing, and building automation tools.
- Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
- Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
- Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
- Build data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search .
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
- Worked on implementation and maintenance of Cloudera Hadoop cluster .
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Active involvement in design, new development and SLA based support tickets of Big Machines applications.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop .
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Developed Oozie workflow jobs to execute hive , Sqoop and MapReduce actions.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala .
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2 .
- Worked on creating various types of indexes on different collections to get good performance in Mongo database.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN .
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco-system.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool .
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Proof-of-concept to determine feasibility and product evaluation of Big Data products
- Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive .
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Involved in developing MapReduce framework , writing queries scheduling map-reduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Developed customized classes for serialization and De-serialization in Hadoop .
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Involved in migration of data from existing RDBMS (Oracle and SQL server) to Hadoop using Sqoop for processing data.
Environment: Hadoop 3.0, MapReduce, HBase, Hive 2.3, Informatica, HDFS, Scala 2.12, Spark, Sqoop 1.4, Apache Nifi, HDFS, AWS, EC2, SQL server, Oracle 12c 7.2
Confidential - Durham, NC
Sr. Data Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
- Used Agile (SCRUM) methodologies for Software Development.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS .
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Developed Big Data solutions focused on pattern matching and predictive modeling.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Developed reconciliation process to make sure elastic search index document count match to source records
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Developed incremental and complete load Python processes to ingest data into Elastic Search from oracle database
- Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Experienced in AWS cloud environment and on S3 storage and EC2 instances
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop .
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed Rest services to write data into Elastic Search index using Python Flask specifications
- Developed complete end to end Big-data processing in Hadoop eco system.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing .
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop .
Environment: Apache Spark 2.3, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8
Confidential - Union, NJ
Data Engineer
Responsibilities:
- Worked with Business Analyst to understand the user requirements, layout, and look of the interactive dashboard to be developed in tableau .
- Gather and documented all business requirements to migrate reports from SAS to a Netezza platform utilizing a MicroStrategy reporting tool
- Involved in Manipulating , cleansing & processing data using Excel , Access and SQL and responsible for loading, extracting and validation of client data.
- Used Python programs for data manipulation, automation process of generating reports of multiple data sources or dashboards
- Designed and implemented Data Warehouse life cycle and entity-relationship/multidimensional modeling using star schema , snowflake schema
- Involved extensively in creating Tableau Extracts, Tableau Worksheet, Actions, Tableau Functions, Tableau Connectors (Live and Extract) including drill down and drill up capabilities and Dashboard color coding, formatting and report operations (sorting, filtering, Top-N Analysis, hierarchies).
- Data blending of patient information from different sources and for research using Tableau and Python.
- Used Boto3 to integrate Python application with AWS Redshift , Teradata and S3 .
- Involved in Netezza Administration Activities like backup/restore, performance tuning, and Security configuration.
- Write complex SQL statements to perform high level and detailed validation tasks for new data and/or architecture changes within the model comparing Teradata data against Netezza data.
- Utilized various Python frameworks and libraries Pandas, Numpy and scipy for analyzing data from data sources AWS Redshift and Teradata and data manipulation.
- Developed Python programs and batch scripts on windows for automation of ETL processes to AWS Redshift.
- Managed the Metadata associated with the ETL processes used to populate the Data Warehouse .
- Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
- Published Workbooks by creating user filters so that only appropriate teams can view it.
- Worked on SAS Visual Analytics & SAS Web Report Studio for data presentation and reporting.
- Extensively used SAS/Macros to parameterize the reports so that the user could choose the summary and sub-setting variables to be used from the web application.
- Created Teradata External loader connections such as Mload, Upsert, Update , and Fastload while loading data into the target tables in Teradata Database .
- Resolved the data related issues such as: assessing data quality, testing dashboards, evaluating existing data sources.
- Created DDL scripts for implementing Data Modeling changes, reviewed SQL queries and involved in Database Design and implementing RDBMS specific features.
- Created data mapping documents mapping Logical Data Elements to Physical Data Elements and Source Data Elements to Destination Data Elements.
- Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
- Designed the ETL process using Informatica to populate the Data Mart using the flat files to Oracle database
- Involved in Data analysis , reporting using Tableau and SSRS .
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
Environment: Tableau Server 9.3, Tableau Desktop 9.3, AWS Redshift, Teradata, Python, SQL, PostgreSQL, Linux, Teradata SQL Assistant, Netezza, EC2, S3, Windows, Pl/Sql
Confidential - Lake Mary, FL
Sr. Data Analyst/Engineer
Responsibilities:
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
- Gather and documented all business requirements to migrate reports from SAS to a Netezza platform utilizing a MicroStrategy reporting tool
- Involved in Manipulating , cleansing & processing data using Excel , Access and SQL and responsible for loading, extracting and validation of client data.
- Used Python programs for data manipulation, automation process of generating reports of multiple data sources or dashboards
- Worked on NoSQL databases including Cassandra . Implemented multi-data center and multi-rack Cassandra cluster.
- Coordinated with Data Architects on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
- Write complex SQL statements to perform high level and detailed validation tasks for new data and/or architecture changes within the model comparing Teradata data against Netezza data.
- Utilized various Python frameworks and libraries Pandas, Numpy and scipy for analyzing data from data sources AWS Redshift and Teradata and data manipulation.
- Developed Python programs and batch scripts on windows for automation of ETL processes to AWS Redshift.
- Managed the Metadata associated with the ETL processes used to populate the Data Warehouse .
- Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
- Performed Reverse Engineering of the current application using Erwin , and developed Logical and Physical data models for Central Model consolidation.
- Translated logical data models into physical database models, generated DDLs for DBAs
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Extensively used ETL methodology for supporting data extraction, transformations and loading processing, in a complex DW using Informatica .
- Developed and maintain sales reporting using in MS Excel queries, SQL in Teradata , and MS Access .
- Involved in writing T-SQL working on SSIS, SSRS, SSAS, Data Cleansing , Data Scrubbing and Data Migration .
- Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns as part of Data Analysis responsibilities.
- Published Workbooks by creating user filters so that only appropriate teams can view it.
- Worked on SAS Visual Analytics & SAS Web Report Studio for data presentation and reporting.
- Extensively used SAS/Macros to parameterize the reports so that the user could choose the summary and sub-setting variables to be used from the web application.
- Created Teradata External loader connections such as Mload, Upsert, Update , and Fastload while loading data into the target tables in Teradata Database .
- Resolved the data related issues such as: assessing data quality, testing dashboards, evaluating existing data sources.
- Created DDL scripts for implementing Data Modeling changes, reviewed SQL queries and involved in Database Design and implementing RDBMS specific features.
- Created data mapping documents mapping Logical Data Elements to Physical Data Elements and Source Data Elements to Destination Data Elements.
- Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
- Written complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2
- Performed GAP analysis of current state to desired state and document requirements to control the gaps identified.
- Developed the batch program in PL/SQL for the OLTP processing and used Unix Shell scripts to run in corn tab.
Environment: Erwin 9.0, PL/SQL, Business Objects XIR2, Informatica 8.6, Oracle 11g, Teradata R13, Teradata SQL Assistant 12.0, PL/SQL, Flat Files
Confidential
Data Analyst
Responsibilities:
- Worked extensively along with business analysis team, scrum masters in gathering requirements and understanding the workflows of the organization
- Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse , transformed and sent to an external entity.
- Analyzed business requirements, system requirements, data mapping requirement specifications , and responsible for documenting functional requirements and supplementary requirements in Quality Center.
- Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
- Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
- Created the test environment for Staging area, loading the Staging area with data from multiple sources.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Monitored the Data quality of the daily processes and ensure integrity of data was maintained to ensure effective functioning of the departments.
- Developed data mapping documents for integration into a central model and depicting data flow across systems & maintain all files into electronic filing system.
- Worked and extracted data from various database sources like DB2, CSV, XML and Flat files into the Data Stage.
- Used and supported database applications and tools for extraction, transformation and analysis of raw data
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and DB2 .
- Wrote SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Delivered file in various file formatting system (ex . Excel file, Tab delimited text, Coma separated text, Pipe delimited text etc.)
- Performed ad hoc analyses , as needed, with the ability to comprehend analysis as needed
Environment: Oracle 9i, SQL, DB2, XML, ad hoc, Excel 2008, data validation