We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Long Island, NY

SUMMARY

  • 8 years of professional experience in Data Engineering responsible for data modelling,datamigration, design, preparing ETL pipelines for cloud.
  • Expertise in providing solutions using Data Science, Machine Learning, NLP, Predictive Analytics, Statistical Modelling, Business Intelligence, Optimization Techniques, and Data Visualization fields.
  • Proficient at writing MapReduce jobs and UDF’s to analyze, transform, and deliver the data as per requirements.
  • Knowledge of Azure Data Factory (ADF), Azure SQL DB, SSIS, Azure Synapse and Azure Data Lake Storage to provide big data architectural solutions and services to the clients and business stakeholders.
  • Hands on experience of UNIX and Shell scripting for Big Data Development
  • Experience in building ETL pipelines, scheduled jobs, data flow activities, run Databricks, Jupiter Notebooks and storing data in secured and efficient storage space such as Azure Blob Storage, Azure Data Lake Storage (ADLS Gen1 & Gen2) and Azure BW.
  • Experienced in using Talend data integration, Created Automated ETL jobs in Talend
  • Strong Hadoop and platform support experience with all the entire suite of tools and services in major Hadoop Distributions - Cloudera, Hortonworks, Amazon EMR, Azure HDInsight.
  • Hands on experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Sound knowledge in developing highly scalable and resilient Restful APIs, ETL solutions and third-party platform integrations as part of Enterprise Site platform.
  • Expertise in building PySpark, and Scala applications for interactive analysis, batch processing, stream processing experience in building production ETL pipelines between several source systems and Enterprise Data Warehouse by leveraging Informatica PowerCenter, SSIS, SSAS and SSRS.
  • Good experience with SQL and NoSQL databases, data modeling and data pipelines. Involved in end-to-end development and automation of ETL pipelines using SQL and Python.
  • Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
  • Experience with migrating data to and from RDBMS into HDFS using Sqoop.
  • Hands-on Experience with AWS cloud (EMR, EC2, RDS, EBS, S3, Lambda, Glue, Elasticsearch, Kinesis, SQS, DynamoDB, Redshift, ECS).
  • Expertise in creating Kubernetes cluster with cloud formation templates and PowerShell scripting to automate deployment in a cloud environment.
  • Working Experience on Azure cloud components (HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, Storage Explorer, SQL DB, SQL DWH, Cosmos DB).
  • Proficient Experience in writing Spark scripts in Python, Java, Scala and SQL for development and analysis.
  • Experience at using Spark API’s for streaming real time data, staging, cleansing, applying transformations and preparing data for machine learning needs.
  • Worked with various streaming ingest services with Batch and Real-time processing using Spark streaming, Kafka Confluent, Storm, Flume and Sqoop.
  • Experience in designing interactive dashboards, reports, performing ad-hoc analysis and visualizations using Tableau, Power BI, Arcadia and Matplotlib,QlikView
  • Integration with engineering teams and help effective decision making and support analytics platforms.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop, Map Reduce, Sqoop, Hive, Oozie, Spark, Zookeeper and Cloudera Manager, Kafka, Flume.

ETL Tools: Informatica, Talend

NO SQL Database: HBase, Cassandra, Dynamo DB, Mongo DB.

Monitoring and Reporting: PowerBI, Tableau, Custom shell scripts

Hadoop Distribution: Horton Works, Cloudera

Application Servers: Apache Tomcat, JDBC, ODBC

Machine Learning: Machine learning: Regression, Logistic Regression, Decision Trees, Random Forests, Clustering, Principal Component Analysis, NLP, Time Series, Optimization.

Other Tools: Spring Boot, Jupyter Notebook, Terraform, Docker, Kubernetes, Jenkins, Ansible, Splunk, Jira

Programming & Scripting: Python, SAS, SPSS, Scala, JAVA, SQL, Shell Scripting, C, C++

Databases: Oracle, MY SQL, Teradata

Version Control: GIT

IDE Tools: Eclipse, Jupyter, Anaconda

Operating Systems: Linux, Unix, Mac OS-X, CentOS, Windows 10, Windows 8, Windows 7

Cloud Computing: AWS, Azure

Cluster Managers: Docker, Kubernetes

Development Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential - Long Island, NY

Data Scientist

Responsibilities:

  • As a part of Modelling, Analytics and Visualization team, built and supported various predictive models and analytics dashboards using various technologies/ tools like Python, R, SQL, Power BI.
  • As a part of Fraud Analytics team of client, created fraud detection models and continuously monitored their performance.
  • Created both rules based and model-based fraud detection models which would identify fraudulent transactions.
  • Created ad-hoc reports using SQL, python and automated the reports to be sent as emails using cron tab.
  • Created jobs which would monitor all the orders and flag the orders which might be risky/ fraudulent
  • Extensively worked with T-SQL (MSSQLServer). Involved in creating database objects like tables, views, stored procedures, triggers, packages, and functions using T-SQL to provide structure and maintain dataefficiently.
  • Worked on storage and analytics services in Azure cloud platform.
  • Experience writing in house UNIX Shell scripts for big data development
  • Created Automated ETL jobs in Talend and pushed the data to Azure SQL data warehouse.
  • Used Azure DataFactory, SQL API and Mongo API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB).Involved in developing automated workflows for daily incremental loads, moved data from traditional RDBMS to Data Lake.
  • Monitored Spark cluster using Log Analytics and Ambari Web UI. Transitioned log storage from MS SQL to CosmosDB and improved the query performance.
  • Exposure to software development practices - Agile / Scaled Agile Framework and DevOps practices including CI/CD
  • Expertise in transforming business requirements into analytical problems, designing solutions, building models, developing data mining, and reporting solutions that scale across a massive volume of data.
  • Expertise in Big Data Ecosystem, designed and automated Custom-built input adapters using Spark, Sqoop and Oozie to ingest and analyze data from RDBMS to Azure Datalake.
  • Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.
  • Developed Spark Scala scripts for mining data and performed transformations on large datasets to provide real time insights and reports.

Confidential - Cincinatti, Ohio

Data Engineer

Responsibilities:

  • Extracted, cleansed and transformed data using Databricks and Spark data analysis.
  • Worked thoroughly in Data transformations, Mapping, Cleansing, Monitoring, Debugging, performance tuning and troubleshooting Hadoop clusters.
  • Worked in the data science team preprocessing and feature engineering and assisted Machine Learning algorithm running in production.
  • Designed and implemented test environment on AWS.
  • Registered datasets to AWS Glue through Rest API.
  • Used AWS API Gateway to Trigger Lambda functions. Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation.
  • Integrator and developed PySpark application as ETL tool.
  • Used DynamoDB to store metadata and logs.
  • Queried with Athena on data residing in AWS S3 bucket.
  • Implemented AWS EMR Spark using PySpark, and utilized DataFrames and SparkSQL API for faster processing of data.
  • AWS Step function used to run a data pipeline.
  • Monitored and managed services with AWS CloudWatch
  • Used Dremio in AWS as Query engine for faster Joins and complex queries over AWS S3 bucket.
  • Performed transformations using Apache SparkSQL.
  • Utilized Different VM for Development and Production environment.
  • Designed and Developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Created Rest API using Flask in Python
  • Lessened access time by data model refactoring, query optimization and implemented Redis cache to support Snowflake.
  • Experienced in Automating, Configuring and deploying instances onAWS environments and Data centers, also familiar withEC2,Cloud watch,Cloud Formationand managing security groups onAWS.
  • Built interactive Power BI dashboards and worked on reporting analytics
  • Extensively used Databricks notebooks for interactive analytics using Spark APIs.
  • Supported analytical platform, handled data quality, and improved the performance using Scala’s higher order functions, lambda expressions, pattern matching and collections.
  • Implemented scalable microservices to handle concurrency and high traffic. Optimized existing Scala code and improved the cluster performance.
  • Involved in building an Enterprise DataLake using Data Factory and Blob storage, enabling other teams to work with more complex scenarios and ML solutions.

Confidential

Data Scientist

Responsibilities:

  • Developed a research report on the performance of private equity funds by applying regression analysis, which helped improve the training material for a batch of 5 new intern hires.
  • Evaluated strategic alternatives based on prediction models to advise clients in debt and equity fund raising which resulted in an improved return on equity.
  • Created an accurate footfall prediction model for the client which helped alter their marketing campaign and thereby improved revenue by 10%.
  • Delivered a 20% increase in client website engagement through AB testing of email campaigns and 17% decrease in churn rate.
  • Clustered real-estate data into 3 regions using k-means clustering with 80% homogeneous clusters using R and assigned ranks to each region. This data with other data is used to train a Machine learning model
  • Created visualizations of rental history, ARV history, and ad-hoc visualizations based on requirements which are used to make business decisions
  • Maintain client relations, a prospect for off-market deals, and manage an investment sales pipeline.
  • Worked with Data Science team running Machine Learning models on Spark EMR cluster and delivered the data needs as per business requirements.
  • Created Sparkjobs on Databricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases.
  • Built a time-critical dashboard for a client utilizing data from internal sources. The dashboard helps traders to take decisions in Energy & Commodities trading and it replaced a paid service from 3rd party BI provider thereby reducing operational costs using Power BI and Teradata.

Confidential

Business Intelligence Analyst

Responsibilities:

  • Built a regression model to forecast relocation expenditures of employees based on data and identified the drivers which influence the relocation costs the most
  • Created on-demand analytical reports through data extraction and transformation using SQL that provided insights to solve business problems
  • Provided department-wide business intelligence by enabling reporting and dash-boarding in SSRS and Tableau
  • Examined data, identified outliers, inconsistencies and manipulated data to ensure data quality and integration
  • Designed and developed relational databases using various datasets and deployed them on servers

We'd love your feedback!