Sr. Big Data Engineer Resume
Hartford, CT
PROFESSIONAL SUMMARY:
- Above 7+ years of experience as Big Data Engineer/Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Hands on experience in SQL queries and optimizing the queries in Oracle, SQL Server, DB2, and Netezza & Teradata.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
- Well versed with Data Migration, Data Conversions, Data Extraction/ Transformation/Loading (ETL)
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Experienced in how to use high-level data processing tools like Pig, Hive, and Spark work with Hadoop Excellent skills in fine tuning the ETL mappings in Informatica.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Extensively used tools are MLoad, BTeq, FastExport and FastLoad to design and develop dataflow paths for loading transforming and maintaining data warehouse .
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop
- Good experience in using SSRS and Cognos in creating and managing reports for an organization.
- Excellent experienced on NoSQL databases like MongoDB, Cassandra.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
TECHNICAL SKILLS:
ETL Tools: Informatica 10.1/9.6.1, (Power Center/Power Mart) (Designer, Workflow Manager, Workflow Monitor, Server Manager, Power Connect), Talend, IDQ.
AWS tools: EC2, S3 Bucket, AMI, RDS, Redshift.
Big Data: MapReduce, HBase, Pig, Hive, Impala, Sqoop, Spark, Pig, Hive
Data Modeling Tools: ER/Studio 9.7/9.0, Erwin 9.7/9.6
Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, SAS BI Platform.
Languages: SQL, PL/SQL, Shell scripting (K-shell, C-Shell), Unix Shell Script.
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server, Netezza.
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model
Operating System: Windows, Unix, Linux
WORK EXPERIENCE:
Confidential - Hartford, CT
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Worked closely with Business Analysts to review the business specifications of the project and also to gather the ETL requirements.
- Assisted in leading the plan, building, and running states within the Enterprise Analytics Team
- Used Agile (SCRUM) methodologies for Software Development.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Built a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
- Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
- Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
- Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2 database.
- Implemented the Big Data solution using Hadoop, hive and Informatica 9.5 to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Created Data Pipeline using Processor Groups and multiple processors using Apache NiFi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Used AWS Cloud with Infrastructure Provisioning / Configuration.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Active involvement in design, new development and SLA based support tickets of Big Machines applications.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Loaded the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
Environment: Apache Spark 2.3, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Netezza 7.2, Pig 0.16, Sqoop 1.2, Flume 1.8
Confidential - Arlington, VA
Data Engineer /Data Analyst
Responsibilities:
- Worked with the analysis teams and management teams and supported them based on their requirements.
- Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros.
- Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
- Used Agile (SCRUM) methodologies for Software Development.
- Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
- Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
- Developed live reports in a drill down mode to facilitate usability and enhance user interaction
- Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
- Wrote Python scripts to parse XML documents and load the data in database.
- Used Python to extract weekly information from XML files.
- Developed Python scripts to clean the raw data.
- Worked on AWS CLI to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.
- Used AWS CLI with IAM roles to load data to Redshift cluster,
- Responsible for in depth data analysis and creation of data extract queries in both Netezza and Teradata databases
- Extensive development in Netezza platform using PL SQL and advanced SQLs.
- Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell) and various reporting procedures.
- Designed reports in SSRS to create, execute, and deliver tabular reports using shared data source and specified data source. Also, Debugged and deployed reports in SSRS.
- Optimized the performance of queries with modification in TSQL queries, established joins and created clustered indexes
- Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.
- Development of routines to capture and report data quality issues and exceptional scenarios.
- Creation of Data Mapping document and data flow diagrams.
- Developed Linux Shell scripts by using Nzsql/Nzload utilities to load data from flat files to Netezza database.
- Involved in generating dual-axis bar chart, Pie chart and Bubble chart with multiple measures and data blending in case of merging different sources.
- Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
- Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts and Bookmarks.
- Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza for to improve reports execution time, worked on creating optimized Data-Mart reports.
- Worked on QA the data and adding Data sources, snapshot, caching to the report
- Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.
Environment: SAS, SQL, Teradata, Oracle, PL/SQL, UNIX, XML, Python, AWS, SSRS, TSQL, Hive, Sqoop
Confidential - Washington, DC
ETL Data Analyst
Responsibilities:
- Defined and modified standard design patterned ETL frameworks, Data Model standards guidelines and ETL best practices.
- Coordinating with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of data from multiple sources.
- Performed detailed data investigation and analysis of known data quality issues in related databases through SQL.
- Performed data validation, data profiling, data auditing and data cleansing activities to ensure high quality Business Objects report deliveries.
- Configured sessions for different situations including incremental aggregation, pipe-line partitioning etc.
- Created effective Test Cases and performed Unit and Integration Testing to ensure the successful execution of data loading process.
- Created SSIS Packages to export and import data from CSV files, Text files and Excel Spreadsheets.
- Generated periodic reports based on the statistical analysis of the data from various time frame and division using SQL Server Reporting Services (SSRS).
- Developed different kind of reports such a Sub Reports, Charts, Matrix reports, Linked reports.
- Analyze the client data and business terms from a data quality and integrity perspective.
- Worked to ensure high levels of data consistency between diverse source systems including flat files, XML and SQL Database.
- Developed and run ad hoc data queries from multiple database types to identify system of records, data inconsistencies and data quality issues.
- Conducted design discussions and meetings to come out with the appropriate data mart using Inmon methodology.
- Maintained Excel workbooks, such as development of pivot tables, exporting data from external SQL databases, producing reports and updating spreadsheet information.
- Created SSIS Packages to export and import data from CSV files, Text files and Excel Spreadsheets.
- Work with Quality Improvement, Claims and other operational business owners to ensure appropriate actions take place to address rejections as well as ensure reprocessing of previously rejected data
- Ensured the quality, consistency, and accuracy of data in a timely, effective and reliable manner.
- Worked with the Business analyst for gathering requirements.
- Created SSIS packages to load data into Data Warehouse using Various SSIS Tasks like Execute SQL Task, bulk insert task, data flow task, file system task, send mail task, active script task, xml task and various transformations.
- Used Sqoop & flume for Data ingestion.
- Migrating all the programs, Jobs and schedules to Hadoop.
- Used Erwin for relational database and dimensional data warehouse designs.
- Conducted and participated in JAD sessions with the users, modelers, and developers for resolving issues.
- Designed physical / logical data models based on Star and snowflake schema using Erwin modeler to build an integrated enterprise data warehouse.
Environment: SQL, SSIS, Data Analytics, MDM, Oracle 10g, Tableau, TOAD, Informatica Power center, Erwin, Windows XP, Excel.
Confidential
ETL Reporting Analyst
Responsibilities:
- Analyzed Business changes, architecture changes, Data Mapping Requirements & Design Specification document for the newly migrated Data warehouse to Create business rules validation Sql scripts and create a health report of the data.
- Used Informatica power center 9.1.0 client components like Designer, Workflow Manger, and Workflow Monitor.
- Performed error checking and validate ETL procedures and programs using Informatica session log for exceptions.
- Created a parallel process to extract, apply business transformations and load the source data from relational database sources into a Data model in Teradata Data warehouse (multiple tables) and compare the target data each business day.
- Create complex Sql queries in Teradata Data Warehouse environment to test the data flow across all the stages
- Code complex SQL queries using advanced SQL skills to load, clean, format & store data using Query UI tools AQT.
- Used BI (Business Intelligence) reporting tool Cognos, Tableau to support development of BI dashboard and highlight KPI across brands, locations and channels to include multi-source data integration and development in BI tool
- Identified Issues/defects and track them till the resolution, Root Cause Analysis and possible solutions for errors encountered and Preparing Status Report
- Perform thorough peer-to-peer training and direct reports, develop internal talent to expand advanced analytics capabilities - including modeling, automation, and data visualization/presentation techniques.
- Attend the workshops and interact with the business users to understand the needs, requirements, business rules in order to design, develop and test the software.
- Code QC framework and validate performance, scalability & reliability of the Enterprise Data ware House.
- Prepare Release documents, Test summary reports, Test Closure reports and sing off documents for each release.
- Create Test Plan, Test cases and Test scripts and maintain using enterprise Test Management tools HP ALM, JIRA
- Document test results and send the QA Metrics Report to the client.
Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint