We provide IT Staff Augmentation Services!

Lead Data Engineer Resume

2.00/5 (Submit Your Rating)

Memphis, TN

SUMMARY

  • Microsoft Certified Data Science professional with 8 years of experience in all phases of diverse technology projects specializing in Data Science, Python, SQL Server and Tableau. Team builder with excellent communications, time & resource management & continuous client relationship development skills.
  • Extensive experience in Data Mining, Machine Learning and Data Science technologies ensuring project completion on time, on budget and with the desired results. Highly creative, innovative, committed, intellectually curious, business savvy with good communication and interpersonal skills.
  • Analytical, performance - focused, and detail-oriented professional, offering in-depth knowledge of data analysis and statistics
  • Equipped with experience in utilizing statistical techniques which include Correlation, Hypotheses modeling, Inferential Statistics as well as data mining and modeling techniques using Linear and Logistic regression, clustering, decision trees, and k-mean clustering
  • Expertise in building Supervised and Unsupervised Machine Learning experiments using Microsoft Azure utilizing multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data: continuous, nominal, and ordinal.
  • Expertise in using Linear & Logistic Regression completed multiple projects applying these techniques
  • Mitigated risk factors through careful analysis of financial and statistical data. Transformed and processed raw data for further analysis, visualization, and modeling.
  • Increased customer visibility by developing real-time insights to sales people and sales managers using Tableau reports, resulting in boosting the revenue by 10%
  • Proficient in research of current process and emerging technologies which need analytic models, data inputs and output, analytic metrics and user interface needs.
  • Good Knowledge in Proof of Concepts (PoC’s), gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
  • Microsoft Certification in Essentials of Data Science
  • Descriptive and inferential statistics
  • Linear and logistic regression models
  • Experience creating and updating Tableau dashboards
  • Ability to work independently and own end to end responsibilities for project execution.
  • Ability to manage and work with both technical and non-technical team
  • Experience working with both structured and unstructured datasets in high-volume, high-velocity production environments.
  • Curious, analytical mind with the ability to identify and assess key business risks and controls.
  • Machine Learning - K-Mean Clustering, Principal Component Analysis (PCA), Factor Analysis, Time Series
  • Strength in Machine Learning, Statistical Modeling, Data Mining, Pattern Recognition, Information Retrieval, Natural Language Processing, or Search Ranking
  • Experience working with large databases
  • Experience with predictive analysis and data visualization techniques using relevant tools (e.g. Tableau, R or Python)
  • Ability in solving problems never before faced with no prior examples to reference for guidance.
  • Passionate about action to deliver any needed risk and control improvements
  • Knowledge of Microsoft Office applications (Excel, PowerPoint, and Word).
  • Knowledge of systems, master data and data architecture to pull data from numerous tables from multiple sources to be merged and analyzed.
  • Knowledge with scripting languages and software - Unix shell, R, Python
  • Engaging cross functional teams of business stakeholders in understanding their prioritization and implementation needs.
  • Balancing analytics with sufficient business acumen to translate complex business processes into IT requirements for automation.
  • Experience with open source Machine Learning and Artificial Intelligence libraries such as Spark MLlib, H2o.ai, TensorFlow, etc
  • Good knowledge on Data Warehouses, RDBMS and MPP database skills, including query optimization, and performance tuning
  • Working knowledge in large-scale/distributed SQL, Hadoop, NoSQL, HBase, Columnar databases
  • Experience with MS Visual Basic, MS Excel, and MS PowerPoint.Net

TECHNICAL SKILLS

  • Cross-Functional Supervision, Team Building & Mentoring
  • Client Relations & Presentations, Business & IT Planning, Vendor Management
  • Others: Data Science Analyst, Modeling, Marketing, Data Analyst
  • Azure Data Engineer Associate.

PROFESSIONAL EXPERIENCE

Confidential, Memphis, TN

Lead Data Engineer

Responsibilities:

  • Responsible for building Big data, AL and Data science platforms, Global Architecture at Confidential Tradetools/Trade Networks
  • Build large scalable data pipeline for machine learning
  • Automated ETL processes, making it easier to wrangle data and reducing time by as much as 40%.
  • Delivered high performing streaming data solution with 300X improved performance for compliance product
  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Developed Spark applications usingPySparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data bricks cluster.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Hands-on experience on developing SQL Scripts for automation purpose.
  • Responsible for maintaining quality reference data in Oracle by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment
  • Performed all post scan production processes specifically image conversion, OCR, endorsements, etc utilizing Doculex and IPRO.
  • Data transformation, cleaning and build model using Dataiku
  • Deploying and score model using Dataiku
  • Experience with the Azure Cloud Stack including Azure Data Lake Storage; Azure Data Factory; Synapse Analytics; Event Hub; HDInsight; and Power BI
  • Using Tesseract to extract information, normalize image and build machine learning model
  • Building model using NLP, regression, Decision tree, Random forest, Decision tree, valuate, validate and deployed into production
  • Good understanding of, and proven experience with performance tuning, scalability, and security in Microsoft Azure environments.
  • Perform Tokenization using Spacy in python
  • Optimizing the flow
  • Metrics and checks to monitor data quality
  • Automating data ingestion and scoring
  • Monitoring model performance over time
  • Model retraining strategy
  • Deploying project to production
  • Versioning and rollback of projects in production
  • Deploying a real time scoring model

Confidential, Columbus, OH

Data Scientist

Responsibilities:

  • Collaborate across teams to determine appropriate data sources, develop data extraction and business rules to be used with visualization reporting tools
  • Acquires data from database using Teradata
  • Writing queries to fetch data, summarize and combine data sources.
  • Write complex SQL query for business need
  • Analyze business problem and create Predictive/ Classification model to solve the issue.
  • Perform data extraction, data manipulation, data cleaning, analysis, modeling and data mining using Python
  • Strong mathematical knowledge and hands on experience in implementing Machine Learning algorithms like K-Nearest Neighbors, Logistic Regression, Linear regression, Naïve Bayes, Support Vector Machines, Decision Trees and Random Forests.
  • Performed multiple Data Mining techniques like Classification, Clustering, Outlier detection and derived new insights from the data during exploratory analysis.
  • Using PCA for data reduction and perform features scaling for machine learning
  • Performed data manipulation and analysis using Pandas and performed data visualization using Matplotlib in Python in the process of estimating product demand.
  • Experience in developing Shell Scripts for system management and automating routine tasks.
  • Building model, evaluate, validate and deployed into production
  • Perform hypothesis testing
  • Involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
  • Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.
  • Built models using decision trees, segmentation, regression and clustering intelligent decision models to analyze customer response behaviors, interaction patterns and propensity.
  • Using different algorithms/ techniques to fit the problem on correct model.
  • Create interactive dashboard using Tableau.
  • Build model for device segmentation classification using Decision trees, K-means, KNN, Random forest.
  • Use machine learning model to build fraud detention and device testing.
  • Participate in data table and dataset design, engineering and code reviews

Confidential, MN

Senior Data Science Analyst

Responsibilities:

  • Collaborate across teams to determine appropriate data sources, develop data extraction and business rules to be used with visualization reporting tools
  • Create SQL queries to simplify migration progress reports and analyses
  • Retrieve data from pertinent sources (e.g. databases at customer sites or Merge's internal project management database), analyze and present results via Excel and SQL
  • Performed data merging, cleaning, and quality control procedures by programming data object rules into a database management system.
  • Acquire, clean using Talend and structure data from multiple source including external and internal databases.
  • Perform data extraction, manipulation, cleaning, analysis, modeling and data mining using R programming in R Studio and Python
  • Designed 10+ dashboards in Tableau for sales managers with instant access to personalized analytics portal, so they can access key business metrics such as time to close opportunity, delay-to-contract, resulting in increased customer satisfaction and improving client’s standing in the Sales Performance management industry
  • Develop and execute processes for accurate data capture across all clients to obtain key insights and relationships to overall business objectives using Statistical Hypotheses Modeling
  • Exploratory data analysis using R to deep dive into internal and external data to diagnose areas of improvement to increase efficiency
  • Knowledgeable of Apache Spark, Hadoop and developing data processing and analysis algorithms using Python.
  • Utilized Booted Decision Tree, Linear and Bayesian Linear Regression Machine Learning models in Microsoft Azure to develop and implement interactive Webservice predictive models
  • Lead the company's machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation, product recommendation and allocation planning; prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs.
  • Designed, built and deployed a set of R modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction (CLTV, marketing funnel propensity models etc.) and support multiple marketing segmentation programs.
  • Built models using decision trees, segmentation, regression and clustering intelligent decision models to analyze customer response behaviors, interaction patterns and propensity.
  • Work as part of analytics team to ensure that the business and technical architecture of the delivered solutions matches business requirements to support continued experimentation and evolve analytics.
  • Validating data sources
  • Performing cross-validation to test the accuracy and efficiency of the data and models.

Confidential, OH

Data Analyst

Responsibilities:

  • Create SQL queries to simplify migration progress reports and analyses
  • Performed data merging, cleaning, and quality control procedures by programming data object rules into a database management system.
  • Performed data analysis by gathering, analyzing and deploying customer centric models from its pristine form to derive insurance related conclusions using SQL
  • Perform data extraction, manipulation, cleaning, analysis, modeling and data mining using R and python
  • Used Tableau to publish visualizations.
  • Developed category segmentation using R which provides customizable view of market share and led to decreased labor cost by 50%.
  • Documented Functional / Nonfunctional requirements, process workflows, data maps, table layouts and their validations.
  • Actively pursued data quality compliance assessed risk factors and generated models and scenarios for forecasting operational risk.
  • Administered cross-training sessions with other departments to improve workflow and streamline project completion.

Confidential, NJ

Data Analyst

Responsibilities:

  • Developed R programs for manipulating the data reading from various Oracle data sources and consolidate them as single CSV File, and update the content in the database tables
  • Created monthly and quarterly business monitoring reports by writing complex SQL queries to include
  • System Calendars, Inner Joins and Outer Joins to retrieve data from multiple tables
  • Perform data manipulation using python
  • Designed easy to follow visualizations using Tableau software and published dashboards on web and desktop platforms
  • Worked on new ideas to develop new mapping techniques and faster the work and predicted the stats based on Severity and Quality.
  • Created numerous processes and flow charts using Visio to meet the business needs and interacted with business users to understand their data needs
  • Monitored dashboards & application performance.
  • Made recommendations to management for the coordination of the daily workflow of the mapping stage and establish standard performance benchmarks for the timely processing of core mapping stages.
  • Involved in analysis, design and documenting business requirement specifications so as to build data warehousing extraction programs, end-user reports and queries
  • Worked closely with Associates to find the problems and getting solutions on the tool. Confidential Financial
  • Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy

Confidential

Data Analyst

Responsibilities:

  • Consolidated & visualized million row of data set and developed complex queries/scripts on SSIS, T-SQL, Excel Macro to perform ETL and data analyses.
  • Utilized exploratory & confirmatory data analytics to investigate classified/clustering data sets on daily basis.
  • Helped team director maintain coefficients/parameters and performed ad hoc analyses like multivariate regression analysis, time series & machine learning techniques to build business intelligence.
  • Maintained risk factors/coefficient models, adjusted risk testing scales and back-tested trading/portfolios data on VAR models to manage quantitative risks on R, SAS, MATLAB, Mathematica.
  • Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy

We'd love your feedback!