We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Bloomfield, CT

SUMMARY:

  • 8 years of IT experience in Analysis, Design, Development, Maintenance and Documentation for Data warehouse and related applications using ETL, BI tools, Client/Server and Web applications on UNIX and Windows platforms.
  • Hands on experience with Statistics, Data Analysis, Machine Learning and deep machine learning using R language.
  • Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction
  • Worked on the BI industry, and have understanding and experience in Big Data with Teradata or Hadoop technologies.
  • Worked with packages like ggplot2 and in R to understand data and developing applications.
  • Developed predictive models using R to predict customers churn and classification of customers.
  • Worked with heterogeneous relational databases such as Teradata, SQL Server, and MS Access.
  • Strong experience in Metadata Management using Hadoop User Experience(HUE).
  • Experience in the Data Analysis, Data Mining, Data Mapping, Data Quality, and Data Profiling.
  • Expertise in T - SQL in creating and using Views, User Defined Functions, Indexes, Stored procedures involving joins, sub-queries from multiple tables and established relationships between the tables using primary and foreign key constraints
  • Managed different versions of complicated code and distribute them to different teams in the organization utilizing TFS.
  • Developed test packages to simulate errors to prepare for final deployments.
  • Experienced in performing Incremental Loads and Data cleaning in SSIS. Managed Error handling, logging using Event Handlers in SSIS.
  • Developed Custom Reports, Ad-hoc Reports by using SQL Server Reporting Services (SSRS).
  • Monitored Strategies, Processes and Procedures to ensure the Data Integrity, Optimized and reduced Data Redundancy, maintained the required level of security for all production and test databases
  • Experienced in using and Database Tools for TOAD for data analysis.
  • Created custom data models to accommodate business metadata including KPIs, Metrics and Goals
  • Knowledge of complete Software Development Life Cycle
  • Administration of the database including performance monitoring, Query tuning & optimization
  • Experience in handling various kind of files (Flat file, CSV and Excel)
  • Experienced in handling concurrent projects and providing expected results in the given timeline
  • Worked on Agile Methodologies and used CA Agile Central

TECHNICAL SKILLS:

BigData Technologies (Hadoop, Pig, Hive, Sqoop, HBase, Spark, Scala), Microsoft SQL SERVER 2014/2012/2010, Teradata, PostgreSQL, SSIS/SSAS/SSRS, MS Excel, MS Visio, ERwin, TOAD, SharePoint, Windows Server 2012 r2/2008 r2, T-SQL, Agile and Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, Bloomfield, CT

Data Scientist

Responsibilities:

  • Worked on the Domains and Communities for Business Glossary and Information Governance Catalog.
  • Applied advanced statistical techniques to create and/or update predictive models for use in claims, pricing, strategy, underwriting, etc.
  • Evaluate model performance & robustness. Use model results to score the portfolio and develop strategies to identify profitable new targets
  • Used packages like dplyr, tidyr and ggplot2 in R Studio for data visualization.
  • Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
  • Conducted research on development and designing of sample methodologies, and analyzed data for pricing of client's products.
  • Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances.
  • Investigated market sizing, competitive analysis and positioning for product feasibility.
  • Worked on Business forecasting, segmentation analysis and Data mining.
  • Enhanced the tool DMV using Core Java JDBC.
  • Created scripts to create new tables and queries for new enhancement in the application using TOAD.
  • Involved in loading process into the HDFS and Pig in order to preprocess the data.
  • Used Spark-SQL to load JSON data and create schema RDD and loaded it into the Hive tables and handled structured data using Spark SQL.
  • Developed Spark jobs to parse the JSON data or XML data.
  • Loaded the data into Spark RDD and did the memory data computation to generate the Output response.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Reduced redundancy among incoming incidents by proposing rules to recognize patterns.
  • Worked on Clustering and classification of data using machine learning algorithms.
  • Analyzed data and recommended new strategies for root cause and finding quickest way to solve big data sets.
  • Hands on experience in writing Pig Latin and Pig Interpreter to run the Map Reduce jobs.
  • Worked on Hive, Pig Queries and UDF’s on different datasets and joining them.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Worked on a robust automated framework in data lake for metadata management that integrates various metadata sources, consolidates and updates podium with latest and high quality metadata using the big data technologies like Hive and Impala.
  • Ingested data from variety of data sources like Teradata, DB2, Oracle, SQL server and PostgreSQL sources to data lake using podium and solved various data transformation and interpretation issues during the process using Sqoop, GIT and udeploy.
  • Responsible for data governance processes and policies solutions using Data Preparation tools and technologies like Podium and Hadoop.
  • Created a data-profiling dashboard by leveraging podium internal architecture, which drastically reduced the time to analyze data quality using Looker reporting.
  • Worked on an analytical model for automating data certification in Data Lake using Impala.
  • Worked on an input agnostic framework for data stewards to handle their ever-emerging work group datasets and created a business glossary by consolidating them using Hive.
  • Worked on a robust comparison process to compare data modelers’ metadata with data stewards’ metadata and identify anomalies using Hive and Podium Data.
  • Created technical design documentation for the data models, data flow control process and metadata management.
  • Designed data models for Metadata Semantic Layer in ERwin data modeler tool.
  • Reversed Engineered and generated the data models by connecting to their respective databases.
  • Strong experience in importing the metadata from various applications and build end-to-end Data Lineage using ERwin
  • Designed conceptual data models based on the requirement, interacted with non-technical end users to understand the business logics.
  • Modeled the logical and physical diagram of the future state, Delivered BRD and the low level design document.
  • Discussed the Data models, data flow and data mapping with the application development team.
  • Developed Conceptual models using Erwin based on requirements analysis
  • Developed normalized Logical and Physical database models to design OLTP system for insurance applications.
  • Used ERWIN and TOAD to review Physical data model of Oracle sources, Constraints, Foreign keys and indexing of tables for data lineage.
  • Worked with Database Administrators, Business Analysts and Content Developers to conduct design reviews and validate the developed models
  • Identified, formulated and documented detailed business rules and Use Cases based on requirements analysis
  • Created technical design documentation for the data models, data flow control process and metadata management.
  • Integrated the work tasks with relevant teams for smooth transition from testing to implementation
  • Extensive use of GIT as a versioning tool.
  • Worked on Agile Methodologies and used CA Agile Central Rally and Kanban dashboard.

Environment: R, R Studio, Podium Data, Data Lake, HDFS, Hue, Hive, Impala, Spark, Scala, Pig, Looker, ERwin 9.64, HTML, JavaScript, Core Java, PostgreSQL, SSIS, Teradata, R, Classification Models, Rally, Kanban.

Confidential, Livonia, MI

ETL Developer/MDM Analyst

Responsibilities:

  • Worked on Confidential ’s Tier 1 Check Processing Application CXR and supported IEX and STP.
  • Performed analysis in different stages of the system development life cycle in order to support development and testing efforts, identify positive and negative trends, and formulate recommendations for process improvements and developments standards.
  • Created SSIS packages to load data from XML Files, Ftp server and SQL Server to SQL Server by using Lookup, Derived Columns, Condition Split, Ole DB Command, Term Extraction, Aggregate, Pivot Transformation, Execute SQL Task and Slowly Changing Dimension.
  • Recreated SQL Stored procedures to accommodate modified logic for Check duplicate detection
  • Created SSIS packages to read X9 files and load data into the database
  • Performed data analysis and data profiling using complex SQL on various sources systems.
  • Involved in identifying the source data from different systems and map the data into the warehouse
  • Performed data analysis and data profiling using complex SQL on various sources systems.
  • Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues
  • Involved in defining the source to target data mappings, business rules, and data definitions
  • Created HBase tables to store variable data formats of input data coming from different portfolios
  • Managed excel spreadsheets, resolved discrepancies associated with metadata
  • Create technical design documentation for the data models, data flow control process, metadata management.
  • Designed, developed and deployed a website for the clients to check the status of the Gates Application in IEX using python.
  • Developed entire frontend and backend modules using Python on Django framework.
  • Used python to extract information from our customers using XML files.
  • Enhanced horizontally scalable APIs using Python Flask.
  • Strong experience in importing the metadata from various applications and build end-to-end Data Lineage using ERwin
  • Involved in adding huge volumes of data in rows and columns to store data in HBase.
  • Worked with HUE and analyzed the datasets.
  • End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets.
  • Successfully loaded files to Hive and HDFS from Teradata Database.
  • Importing and exporting data into HDFS and Hive from Teradata using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
  • Experienced in managing and reviewing Hadoop log files.
  • Load and transform large sets of structured data.
  • Responsible to manage data coming from different sources.

Environment: BigData Technologies (Hadoop, Pig, Hive, Sqoop, HBase, Hadoop-Map reduce)MS SQL Server 2014, SSIS Package, SQL BI Suite (SSMS, SSIS, SSRS, SSAS), XML, MS Excel, MS Access 2013Windows Server 2012, SQL Profiler, Erwin r8, TFS 2013.

Confidential, Baltimore, MD

Big Data Developer & Analyst

Responsibilities:

  • Worked on Agile Methodologies, participated in daily/weekly team meetings, peer reviewed the development works and provided the technical solutions.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • Implemented secondary sorting to sort reducer output globally in map reduce.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data. Wrote complex Hive queries and UDFs.
  • Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
  • Involved in configuring multi-nodes fully distributed Hadoop cluster.
  • Installed and configured Hive and also wrote Hive UDF's that helped spot market trends.
  • Extracted and parsed RDF data using a Java API called Sesame, from ontology system called Semaphore, which is used to process the unstructured resources and build NLP capabilities.
  • Loaded the final data from both structured and unstructured resources into Neo4j graph data base to facilitate the search capabilities on a graph data store.
  • Worked on variety of file formats like Avro, RC Files, Parquet and Sequence File and Compression Techniques.
  • Installation, Configuration, and Administration of Hadoop cluster of major Hadoop distributions such as Cloudera Enterprise (CDH3 and CDH4) and Hortonworks Data Platform (HDP1 and HDP2).
  • Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem.
  • Loaded data from various data sources into HDFS using Sqoop.
  • Gained very good business knowledge on different category of products and designs within.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.

Environment: MapReduce, HDFS Sqoop, Flume, LINUX, Oozie, Hadoop, Pig, Hive, Hbase, Hadoop Cluster, Unix

Confidential, Southfield, MI

Data Analyst/SQL Developer

Responsibilities:

  • Worked on EDW (Enterprise Data Warehouse) for Claims Reporting Project.
  • Analyzed the datasets and loaded the data into SQL Server tables for BI reporting project
  • Ensured best practices are applied and integrity of data is maintained through security and documentation
  • Configured and Maintained Report Manager and Report Server for SSRS.
  • Created reports to retrieve data using Stored Procedures that accept parameters depending upon the client requirements
  • Involved in Debugging and Deploying reports on the production server
  • Involved in data management processes and ad hoc user requests
  • Wrote standard T-SQL to perform Data Validation and create Excel summary reports (pivot tables and charts).
  • Involved in requirements gathering, source data analysis and identified business rules for data migration and for developing data warehouse/data mart
  • Involved in identifying and defining the data inputs and captured metadata and associated rules from various source of data for ETL Process for data warehouse
  • Worked with Business Analyst to develop business rules that support the transformation of data
  • Involved the verification of data accuracy within SAS Analytical Systems and source systems
  • Worked with SAS Datasets and analyzed the data for analytical reporting
  • Created Source to target mapping documents from staging area to Data Warehouse
  • Designed and optimized indexes, views, stored procedures and functions using T-SQL
  • Helped designing and implementing processes for deploying, upgrading, managing, archiving and extracting data for reporting
  • Performed maintenance duties like performance tuning and optimization of queries, functions and stored procedures

Environment: SAS 9.2, SAS Datasets, SQL Server 2008/2012, SSIS, SSRS, Tidal Job Scheduler

Confidential, Dallas, TX

SSIS/ETL DEVELOPER

Responsibilities:

  • Extract, Transform and Load source data into respective target tables to build the required data marts.
  • Designed and developed SSIS Packages to import and export data from MS Excel, SQL Server and Flat files.
  • Involved in daily batch loads (Full & Incremental) into Staging and ODS areas, troubleshooting process, issues and errors using SQL Server Integration Services (SSIS).
  • Developed and tested extraction, transformation, and load (ETL) processes.
  • Used various Transformations in SSIS Dataflow, Control Flow using for loop Containers and Fuzzy Lookups etc.
  • Extracted data from database and spreadsheets and staged into a single place and applied business logic to load them in the database
  • Implemented Event Handlers and Error Handling in SSIS packages
  • Developed, monitored and deployed SSIS packages.
  • Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables with incremental load.
  • Created logging for ETL load at package level and task level to log number of records processed by each package and each task in a package using SSIS.
  • Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional work flow of information from source systems to destination systems

Environment: MS SQL Server 2012, SSIS Package, SQL BI Suite (SSMS, SSIS, SSRS, SSAS), PowerPoint

Confidential

ETL Developer

Responsibilities:

  • Participated in the Software Development Life Cycle (SDLC) processes including Analysis, Design, Coding, Testing and Deployment.
  • Created Cubes with Dimensions and Facts and calculating measures and dimension members using Multidimensional expression (MDX).
  • Involved in building the Enterprise Data Warehouse for Policy Management, Premium Billing, Member Policy Renewal process.
  • Created SQL objects like Tables, Stored Procedures, Functions, and User Defined Data-Types.
  • Created SSIS packages to load data into Data Warehouse using Various SSIS Tasks like Execute SQL Task, bulk insert task, data flow task, file system task.
  • Developed ETL jobs to load information into Data Warehouse from different relational databases and flat files.
  • Generated different reports using MS SQL Reporting Services
  • Created various Tabular and Matrix Reports using SSRS.

Environment: SQL Server 2005/2008, SSIS, SSRS, SQL Stored Procedures, Flat files, Excel, MS Office

We'd love your feedback!