We provide IT Staff Augmentation Services!

Senior Data Engineer/data Scientist Resume

5.00/5 (Submit Your Rating)

Rocky Hill, CT

SUMMARY:

  • Over 10 years of experience in system analysis, design, development and testing in various projects, 7+ years of experience in MS SQL Server Database Development, 5+ years of SSIS, SSMS, SSRS and SSAS Microsoft BI Stack, 5+ years of Data warehouse development experience.
  • 4+ Years of Experience in various Deep Learning and Reinforcement Learning techniques like CNN, RNN, LSTM.
  • Expertise in programming tools like Python, R, SAS and complete maintenance in Software Development Life Cycle (SDLC) like Agile/SCRUM environment.
  • Design and develop ELT pipelines in Microsoft Azure utilizing Azure SQL Server, Data Factory, Azure Databricks, and Data Lake Storage
  • Stream data using Azure Delta Lake using Databricks
  • Extensive experience in handling Large Datasets of Structured and Unstructured data and building Models using various Machine Learning Algorithms and Techniques.
  • Adaptable in various Data Modeling packages like NumPy, SciPy, Pandas, Beautiful Soup, Scikit - Learn, Matplotlib, Seaborn in Python and Dplyr, TidyR, ggplot2 in R.
  • Deep understanding of various Machine Learning Algorithms like Linear and Logistic Regression, Decision Trees, Ensemble Methods, Recommendation systems, Clustering, Time series and Forecasting, Predictive Modeling, PCA, Neural Network algorithms to train and test the huge data sets.
  • Experience in working wif NoSQL Databases like MongoDB
  • Over 2+ years of experience in IBM DataStage design and development of parallel ETL jobs
  • Over 2+ years of experience in Unix shell scripting, for ETL job run automation and Datastage engine admin
  • Over 2+ years of experience in loading, optimizing and querying Netezza/IBM Pure Data appliance
  • Over 3+ years working wif Microsoft SQL Server Change Data Capture, utilized it for incremental loading
  • Over a year working wif Data Management (RDM) for Data Warehouse
  • Over 2+ years using R and Python to build Machine Learning Algorithms for predictive models
  • Over 4+ years of experience in design, development in Oracle PL/SQL Development, version 12c/11g
  • Expert level skills in Data Modeling, Data Mapping, Table Normalization, Table Denormalization, Optimization and Tuning, RDBMS concepts and constructs, Kimball and Inmon Data warehousing methodologies.
  • Experience in software Analysis, Design, Development, Testing, Implementation and Production Support of Client/Server and Web based applications.
  • Proficient in Installing SQL Server configuration of their tools.
  • Hands on experience in migration of database from SQL Server 2012 to SQL Server 2016.
  • Experience in various Development Methodology like AGILE, WATERFALL, SCRUM
  • Expert in TSQL & PL/SQL DDL/DML, performed most of the SQL Server Enterprise Manager and Management studio functionality using T-SQL Scripts and Batches.
  • Expert in creating indexed Views, complex9o Stored Procedures, TEMPeffective Functions, and appropriate Triggers to facilitate efficient data manipulation and data consistency.
  • Expert in Data Extraction, Transforming and Loading (ETL) using SQL Server Integration Services (SSIS), DTS, Bulk Insert, BCP. From sources like Oracle, Excel, CSV, XML
  • Expert writing SSIS Configuration, Logging, Procedural Error Handling, Custom Logging and Data Error Handling, Master Child Load methods using Control tables.
  • Experience in using tools like index Tuning Wizard, SQL Profiler, and Windows Performance Monitor for Monitoring and Tuning MS SQL Server Performance.
  • Experience in creating Jobs, Alerts, SQL Mail Agent, and scheduled DTS and SSIS Packages.
  • Good knowledge in Star Schema and Snowflake Schema in Data Warehouse in Dimensional Modeling.
  • Extensive experience creating SSRS reports like Parameterized report, Bar Report, Chart Reports, Linked Report, Sub Report, Dashboard, Scorecards reports.
  • Extensive experience creating ROLAP/MOLAP/HOLAP and KPI’s using SSAS cubes, processing cube and deploying cube to Production.
  • Implementation of Master Data Management in Data Warehouse ensuring data governance, removal of duplicate records, data cleansing and profiling.
  • Extensive hands on experience in core .Net technologies like ASP, ASP. NET, ADO.NET, HTML, AJAX, CSS, C#.NET, VB, XML to create web forms.
  • Experience in designing Database Models using MS Visio and ER-Win.
  • Experience writing deployment script using DOCKER, SQLCMD, RS, DTUTIL and release document for release management team.
  • Worked extensively on system analysis, design, development, testing and implementation of projects (Complete SDLC) and capable of handling responsibilities independently as well as a proactive team member.
  • Good SQL Server Administration skills including backup recovery, database maintenance, user authorizations, Database creation, Tables, indexes, Partitions, running Database Consistency Checks using DBCC.
  • Expert in designing complex reports like reports using Cascading parameters, drill-through and drill down Reports, Parameterized Reports and Report Models and ad-hoc reports using SSRS, COGNOS, Qlick View, Tableau, Zoho, Crystal Reports and Excel Pivot table based on OLAP cubes which make use of multiple value selection in parameters pick list, and matrix dynamics reports
  • Good Business and System analysis skills to understand the requirements and customize client

TECHNICAL SKILLS:

Operating Systems: Windows 7.0/10, Unix

Databases: MS SQL Server R2, Oracle 12c, My SQL, MS Access, DB2 NetezzaNoSQL DB: Mongo DB

ETL Tools: SQL Server Integration Services (SSIS), IBM DataStage

Database Tools: SQL Profiler, Management studio, Index Analyzer, SQL Agents, SQL Alerts, Visual Source Safe. Microsoft SQL Server CDC, IBM CDC, AWS Ec2, AWS RDS, MapReduce

Languages: R, T-SQL, Visual Basic 6.0, C, C++, C#, JAVA, HTML, PL/SQL,VBA,PYTHON, Hadoop, Spark

Reporting Tools: SQL Server Reporting Services (SSRS), Tableau, MS Excel

DB Modeling Tools: Erwin. Embarcadero

Frameworks: Docker

PROFESSIONAL EXPERIENCE:

Confidential, Rocky Hill, CT

Senior Data Engineer/Data Scientist

Responsibilities:

  • Create and maintain optimal data pipeline architecture in cloud Microsoft Azure
  • Utilized web scraping techniques like Beautiful Soup in Python to extract and organize competitor data
  • Performed Time series analysis and decomposition (ARIMA, STL, ETS) on product sales records to understand seasonality and trend of in store Hair product sales
  • Developed a prediction model using Xgboost and Random Forest to predict the number of units sold given independent variables like week, unit price, store count
  • Recommended Sales Promotional strategy based on in depth analysis of sales trend and model predictions
  • Took proof of concept projects ideas from business, lead, developed and created production pipelines dat deliver business value using Azure Data Factory
  • Exposed transformed data in Azure Spark Databricks platform to parquet formats for efficient data storage
  • Architected solutions for various analytics requirements in the cloud dat were scalable and efficient utilizing SQL Databases and ELT techniques
  • Assemble large, complex data sets dat meet functional / non-functional business requirements.
  • Implemented Spark SQL jobs for data transformation
  • Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and ‘big data’ technologies like Hadoop Hive, Azure Data Lake storage
  • Built the data platform for analytics tools dat utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
  • Work wif stakeholders including the Executive, Product, Data and Design teams to assist wif data-related technical issues and support their data infrastructure needs.
  • Kept our data separated and secure across national boundaries through multiple data centers and regions.

Tools: Hadoop, Hive, Azure Data Lake, Azure Data Factory, Spark, Databricks, Dremio, Tableau, PowerBI, Python, R, Knime

Confidential, New Hyde Park, NY

Data Scientist/Senior Data Engineer

Responsibilities:

  • Worked wif Perficient Consulting group to build an Enterprise Data warehouse solution for Northwell Health
  • Developed predictive models wif Clinical Analysts for LACE score index calculations wif R machine learning algorithm
  • Performing data transformation and wrangling using python scripts
  • Performed Statistical analysis using linear regression to find correlation between data points
  • Inferred model results based on multivariate prediction models like K nearest neighbors, logistic regression, Naïve Bayes etc. wif Bayesian inference method.
  • Analyzed and processed complex datasets about patient readmission rates and ways to reduce it.
  • Designed and developed DataStage code for extracting source data into Persistent stage layer then to a Standard Interface gateway using SCDII (type 2 slowly change dimension) methodology and Microsoft Change Data capture
  • Developed Spark programs using Python (PYSpark) API for data wrangling and transformations.
  • Queried Structured Data using Spark SQL in order to enable rigorous analysis
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Application Master, Node Manager, Name Node, Data node and MapReduce concepts
  • Enhanced the vendors ETL framework to ensure a more dynamic and parameterized design to extract data from source tables using ETL configuration tables
  • Designed about ELT jobs to move, integrate and transform Big Data from various sources to a single Target database.
  • Scraped webpages using XPATHs, API querying, using parsing frameworks like BeautifulSoup, and LXML
  • Used Regular expression (Regex) in Python and SQL to extract and extrapolate vital medical metrics for analytics from lab results notes.
  • Maintained over 100 ETL jobs dat populate IBM UDMH (Unified Data Model for Healthcare) Atomic Model and Dimensional model
  • Extensive understanding of Linstedt Data Vault modeling and ETL loading implementation and architecture
  • Designed ETL’s to populate Anchor tables, array tables, bridge tables, Detail tables and tables
  • Loaded data into Netezza database, ensuring proper distribution of data across data slices wif the most adequate distribution keys
  • Converted data mapping documents to ETL/Datastage jobs loading data into Netezza.
  • Created Unix scripts to execute jobs automatically and in a schedule
  • Debug and monitor parallel job failures and errors dat arise from source system anomalies or interepratation of data mapping
  • Designed ELT jobs to load Big Data over 4 billion records from SQL Server Data Source to Massively Parallel Processing (MPP) IBM Pure Data Appliance
  • Performed various performance tuning techniques to leverage the columnar data storage, and distributed computing architecture of Netezza DWH appliance
  • Implement CDC from Source system to Atomic and Dimensional tables, ensuring hard and soft deletes are flagged at the record level
  • Create Data Validation scripts dat would be used for verification correctness of ETL logic and transformations between source system tables and dimension tables.

Confidential, NY

Data Engineer

Responsibilities:

  • Designed and developed a SCD me incremental loading process from a SQL Server Data Source to a Oracle Staging Database using IBM Infosphere Change Data Capture, PLSQL packages and Datastage
  • Set up Infosphere Change Management console subscriptions to a SQL Server database, set bookmarks to start of the Log Reading and configured Oracle Target Tables for record insert
  • Designed and Developed DataStage jobs to process FULL Data loads from SQL Server Source to Oracle Stage
  • Imported Table Definitions using ODBC plug in in DataStage Repository
  • Designed one Parameterized job using Parameter sets dat can be re used for multiple tables implementing Run Time Column Propagation in DataStage
  • Created Datastage jobs dat wrote into Parameter files so dat subsequent jobs in the sequence can read it for proper execution
  • Designed Data Stage SEQUENCE Jobs dat controlled ETL for multiple tables in a subject area and sends a success email once job is complete
  • Implemented complex Datastage Transformer logic for various business rules in the ETL
  • Designed and developed incremental load logic dat used a control table dat store the min and max LSN (transaction commit cycle ID) for the successfully loaded transactions.
  • Implemented Error Handling in Datastage and designed Error jobs to notify user and update log table
  • Designed performance boosting jobs dat ran the Datastage job on 4 nodes taking advantage of Datastage Parallel execution of different partitions to reduce job run time
  • Implemented Microsoft SQL Server Change Data Capture for SQL Server Data sources, taking advantage of the inbuilt functions like sys.get min LSN, sys.get max LSN, and sys.net change
  • Installed SQL Server 2012 and Management tools using SQL Server Setup Program.
  • Created Unique index on business keys in table enabling data validation and integrity.
  • Designed and Developed an ETL Control Log table dat records the number of inserts, update, delete and error message for each running process
  • Created Tables in Oracle specifying the appropriate tablespace, data storage parameters (initial extent, next extent, pct increase)

Confidential

Senior Data Warehouse / IBM DataStage Developer

Responsibilities:

  • Understood business/data transformation rules, business structure and hierarchy, relationships, data transformation through mapping development.
  • Reverse Engineered the Stage Schema to create a Data Model using Embarcadero ER/Studio Architect.
  • Created a Data Lineage from Source Table to Stage Tables showing data mapping rules, transformations and business logic using Embarcadero ER/Studio Architect.
  • Developed SSIS packages to extract data from SQL Server Source and load into Oracle Database using Oracle Attunity Drivers Develop PL/SQL packages, procedures, functions, and views for business requirement
  • Worked extensively on Ref Cursor, External Tables and Collections.
  • Implemented large table partitions using RANGE Partition and LIST Partition techniques
  • Implemented and design procedures using FOR ALL and BULK Collection concepts
  • Experience wifPerformance Tuningfor Oracle RDBMS usingExplain PlanandHINTS.
  • Expertise inDynamic SQL, CollectionsandException handling
  • Implemented Partition Exchange to migrate data for over 2 billion records efficiently from MS SQL Server to Oracle DB
  • Created Stage tables to process Source data per logical partition for ETL Parallelism in SSIS.
  • Created an SSIS Package to efficiently load CLOB columns from SQL Server to Oracle using Attunity drivers and Conditional Split transformation
  • Created Source to Target Mapping Documentation in excel to outlines the data conversion/transformation routines implemented from SQL Server Source tables columns to Oracle target Table columns
  • Created Source analysis documentation to describe the tables in source system wif relationships outlining business keys
  • Modified table parameters to enable faster bulk load wif NOLOGGING and Parallel hint
  • Enable parallel DML insert (Direct Load INSERT)
  • Developed SSIS packages to extract data from MS SQL DB to Oracle DB for over 2 billion records and 1 TB data volume
  • Developed Oracle Packages wif Procedures for Full and Incremental ETL processes using Merge statement
  • Implemented SCD II in incremental loading, keeping the historical track of Inserted, updated and deleted records using SQL MERGE STATEMENTS, ensured to process only changed records
  • Implemented SCD me in incremental loading of huge historical table from source to a stage environment
  • Modified and validated Audit Trail Triggers capturing CDC records.
  • Partitioned tables wif size over 100GB to increase availability and efficiency of full and incremental load into stage environment.
  • Created Mapping documents for data analytics vendor for various Source systems
  • Created Source to Target and Source Analysis documentation for various Source Systems imported to Stage Environment
  • Implemented Parallel data reading wif separate data flows running in parallel and loading into one table
  • Participated in daily Stand up meeting working on Product Back log for Sprint goals

Environment: MS SQL SERVER 2014/2012, SSIS, SSRS, Business Intelligence Studio, DTS, Oracle12C, T-SQL, VSS 2005, Erwin r7.2, SQL profiler, Embarcadero ER Studio Data Architect, Dot NET framework 3.5, ASP.NET,C#, XML, PL/SQL, Star and Snowflake schema Data Models

Confidential, Murray Hill, NJ

SQL Server Database/ SSIS/TSQL

Responsibilities:

  • Installed SQL Server 2012 and Management tools using SQL Server Setup Program.
  • Use Tableau Desktop to analyze and obtain insights into large data sets
  • Created primary key and foreign key relationships between tables wif appropriate cluster and non cluster indexes
  • Designed, reviewed, and created primary objects (views, indexes, etc.) based on logical design models, user requirements and physical constraints.
  • Partitioned Tables and Indexes to more TEMPthan one file group in database
  • Developed a logical Data model and Star Schema based on client business process.
  • Wrote T-SQL queries, functions, Stored Procedures and used them to build packages.
  • Used SQL Server profiler for auditing and analyzing the events which occurred during a particular time horizon and stored in script.
  • Performed daily tasks including backup and restore by using SQL Server 2012 tools like SQL Server Management Studio, SQL Server Profiler, SQL Server Agent, and Database Engine Tuning Advisor.
  • Written complex Stored Procedures, Triggers, and Functions for SQL Server 2012.
  • Created traces using SQL server profiler to find long running queries and modify those queries as a part of Performance Tuning operations.
  • Extensively used joins and sub queries to simplify complex queries involving multiple tables.
  • Monitored Strategies, Processes and Procedures to ensure the Data Integrity, Optimized and reduced Data Redundancy, maintained the required level of security for all production and test databases.
  • Ensured Data Recovery, Maintenance and space requirements for physical Database.
  • Re-organized database structures as needed, automate procedures at regular intervals.
  • Optimized long running Stored Procedures & Queries for TEMPeffective data retrieval.
  • Wrote PL/SQL queries, functions, Stored Procedures and used them against an Oracle Database
  • Tuned SSIS package performance by maximizing parallel processing in both Control flow and Data flow levels
  • Implemented complex C# scripting on ETL tool (SSIS) for error column name identification during data level transformation
  • Translating business requirements (BRD) into technical design specifications (FRD).
  • Utilized Excel Source and Microsoft Access as source for SSIS Extraction needs
  • Identified the Confirmed dimensions, Junk dimensions, SCDs based on client source System analysis.
  • Designed SSIS Packages to transfer data from various sources like Text Files, XML Files, Excel, Flat files to SQL Server2008/2012
  • Implemented default logging, Custom Logging and configured Error Handling to handle various OnError events
  • Scheduling the SSIS packages Jobs and prepared Runbook for the Production Support Team
  • Prepared Release Notes, Test plans and Deployment document for Release Team on various SSIS packages developed
  • Using SQL Server Reporting Services (SSRS) delivering enterprise, Web-enabled reporting to create reports dat draw content from a variety of data sources.
  • Responsible for optimizing all indexes, SQL queries, stored procedures to improve the quality of software.
  • Transformations of data such as adding derived column, count of records, merging of data has been done while pulling data from Source to the Target.
  • Created xml configuration files so dat packages can be executed on any server/database by changing the configuration path in the XML files.
  • Extensively used SQL Loader to Load Data from Flat Files to Oracle Database.
  • Played Active part in the development of SQL Server Maintenance plan, scheduling of Jobs, Alerts and Troubleshooting
  • Used Version Control software like TFS- Team Foundation Server Version control and STAR Team for package development history and audit.
  • OLAP model based on Dimension and FACTS for efficient loads of data based on Star Schema structure on levels of reports.
  • Implemented reconciliation processes, and identified and resolved inconsistencies

Environment: MS SQL SERVER 2008R2/2012, SSIS, SSRS, Business Intelligence Studio, DTS, Oracle10g, T-SQL, VSS 2005, Erwin r7.2, SQL profiler, Dot NET framework 3.5, ASP.NET,C#, XML, PL/SQL, Star and Snowflake schema Data Models

Confidential, Chicago, IL

SQL Database Developer/SSRS/SSAS Developer

Responsibilities:

  • Performed System Study and Requirements Analysis, prepared Data Flow Diagrams, Entity Relationship Diagrams, Data Diagrams, Table Structures.
  • Maintained Broker account databases ensuring dat business rules are obeyed on database level
  • Created SSIS packages to load daily reports from different trading Platforms for application analysis
  • Design Excel Power pivot for SSAS cubes presentation and visualization of multidimensional data
  • Involved in Creation of Data Marts wif Dimensions using STAR and SNOW FLAKE Schema.
  • Worked wif offshore Development and Production Support team from Pune India. All daily jobs were supported by Offshore team and when issue arose, they production support team would involve the first level India Development team if it is off hours. If it cannot be resolved, then team in US would be involved.
  • Created dynamic package dat executes child packages based on expressions in execute package task using environment variables
  • Managed Security Master Table ensuring CUSIP, SEDOL or CINS security identifiers were maintained precisely
  • Identified Master Data for business Account Master, Portfolio Master, Asset Master
  • Wrote TSQL queries to ensure consistency and correctness of Master Data based on Business Requirements
  • Develop the Documents for Logging/Error Handling for SSIS Packages.
  • Published SSRS report to a SharePoint site configured for Report access in different departments
  • Schedule Jobs to run SSIS package in night feed to DSS system for fresh day data using SQL Server Agent.
  • Wrote complex T-SQL Queries to perform data validation and graph validation to make sure test results matched back to expected results based on business requirements.
  • Created pivot table to provide structured to Reports in proper manner like Commission Reports, AUM Reports, Holding Reports, Portfolio Reports, and Trade limit Reports.
  • Scheduled the Reports to run on daily and weekly basis in Report Manager and also email them to director and analysts to review in Excel Sheet using IBM Tivoli and Cron Job.
  • Participated in testing during the UAT (QA testing). Mentored and monitored team development and status.
  • Gathered Requirements from the end user and checked the structure of the schema and data wif the Data Modeler.
  • Imported Metadata and created Presentation Layer, Business Layer and DataSource layer, Physical Layer.

Environment: SQL Server 2008/2008R2, SQL Server Integration Services, MS Excel 2012, MS Access 12 Enterprise Manager, Management Studio, SQL Profiler and SSRS, Net 2.5-3.0, Ado.net, Asp.net, HTTP

Confidential, Manhattan, NY

T-SQL/SSIS/SSRS/SQL Server Database Developer

Responsibilities:

  • Installed SQL Server 2008 and Management tools using SQL Server Setup Program.
  • Created SSRS reports showing various KPI’S like Medical Claims Ratio and Customer Satisfaction
  • UsedSSISpackages to roll our data to Live Tables and to Medical Claim Processing Database.
  • Involved in extensive SSRS Fraud, Medical Claims reporting and regular maintenance and verification of reports.
  • Created SSIS packages for the extracting, transforming and loading the data from SQL Server 2008 and flat files.
  • Developed the SQL Server Integration Services (SSIS) packages to transform data from SQL 2005 to MS SQL 2008.
  • Convert all existing DTS package of MS SQL by adding extra SSIS task.
  • Responsible for creating batches & scripts for implementing logical design to T-SQL.
  • Responsible for creating database objects like table, views, store procedure, triggers etc) to provide structure to store data and to maintain database efficiently.
  • Created Views and Index Views to reduce database complexities for the end users.
  • Perform T-SQL tuning and optimizing queries using SQL server 2008.

Environment: SQL Server 2005/2008, SQL Server Integration Services, MS Visio, MS Excel 2008, MS Access 2008, Enterprise Manager, Management Studio, MS SQL Query Analyzer, SQL Profiler and SSRS and Crystal report 9.0, .Net 4.0, WCF,HTML5,Javascript,Asp.net.

We'd love your feedback!