We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

New York City, NY

SUMMARY:

  • Master’s in information technology, Involved majorly in Data mining, Data Sciences and UI Development.
  • Professional qualified Data Scientist/Data Engineer/ Data Analyst with over 5+ years of experience in Data Science and Analytics including Deep Learning/Machine Learning, Data Mining and Statistical Analysis
  • Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.
  • Experienced with machine learning algorithm such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression and k - means
  • Strong skills in Mathematical and statistical methodologies.
  • Extensively worked on R, Python (NumPy, SciPy, Pandas, Matplotlib, NLTK, Theano, TensorFlow, Scikit-learn and MATLAB).
  • Experience in implementing data analysis with various analytic tools, such as Anaconda Jupiter Notebook, R (ggplot2, Caret, dplyr) and Excel.
  • Solid ability to write, tune and optimize SQL queries. W orking knowledge of RDBMS like Oracle, MySQL, SQL Server and NoSql databases like MongoDB and Cassandra.
  • Strong experience in Big Data technologies like Spark, Sparksql, pySpark, Hadoop, HDFS, MapReduce and Hive.
  • Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Social Network Analysis, Cluster Analysis, and Neural Networks.
  • Strong SQL programming skills, with experience in working with functions, packages and triggers.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing.
  • Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS:

Languages: Python, R, Scala

Packages: Marvin, Cleanco, ggplot2, caret, dplyr, Rweka, gmodels, Edward, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numPy, TensorFlow, seaborn, sciPy, matplot lib, scikit-learn, Beautiful Soup, Rpy2.

Big Data Technologies: AWS, Hadoop, Hive, MapReduce.

Databases: MySQL, MS SQL and PostgreSQL.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Spotfire, Business Intelligence, SSRS.

ETL Tools: Talend, Pentaho and AWS Glue ETL.

Version Control Tools: SVM, GitHub and GitLab.

BI Tools: Tableau, Amazon, Google and Kibana.

Operating System: Windows, Linux, Unix.

PROFESSIONAL EXPERIENCE:

Confidential - New York City, NY

Data Engineer

Roles & Responsibilities:

  • Designed database architecture and developed data platform.
  • Designed and developed data pipelines. Automated based on the requirement.
  • Developed Topic extraction (Topic Modelling) model using LDA and other customized NLP techniques.
  • Developed and automated data normalizing model using Python libraries.
  • Developed process to generate missing data from source files.
  • Designed and developed data cooking model to normalize and transform raw data.
  • Gathering requirements from SME, Project Manager and Business analysis.
  • Established AWS Bastion host for secure data transaction between application and database team.
  • Represented team in few summits.
  • Created POC on Maria DB Audit Plugin, identified few performance issues then escalated to Maria DB research and Develop team.
  • Designed and developed a ETL (Extraction Transformation Loading) model and automated using AWS stack (Lambda, Glue and Step Function).
  • Created Visualization and Elastic search using Kibana.
  • Extensively used NLP, Random Forest, K-Nearest Neighbors, Classification and Regression Trees.
  • Gained more experience about Chemistry, chemicals and involved in business decisions.

Environment: Python, SQL, AWS Stack, Spark, R, Docker and Linux.

Confidential - Dallas, Texas

Data Scientist (Machine Learning)

Roles & Responsibilities:

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc.
  • Worked of various machine learning algorithms and statistical modelling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Prepared multi-class classification data for modeling using one hot encoding, applied unsupervised and supervised learning methods in analysing high-dimensional data.
  • Carried out univariate, bivariate and multivariate analyses in ggplot2.
  • Analysed which variables were or were not corelated with quality.
  • Tested whether a change in website layout created a statically significant improvement to conversion rate.
  • Compared resulting p-value with one derived from logistic regression.
  • Created scatter plot and pie chart from complex structured data.
  • Analysed Confusion Matrix, IOU (Intersection over Union), ROC (Receiver Operating Characteristics), Precision and Recall Curve among other for parameter tuning and model evaluation. Analyze learning models using tensor board.
  • Categorised comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database and used ETL for data transformation. Developed MapReduce pipeline for feature extraction using Hive and Pig.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python and build models using Predictive Analytics.

Environment: Python, HDFS, Hadoop, Hive, AWS, Linux, Spark, Tableau, SQL Server, Microsoft Excel, Spark SQL and PySpark.

Confidential - Valparaiso, IN

Data Scientist / Data Engineer

Roles & Responsibilities:

  • Designing and Developing an e-commerce website using CMS, HTML, CSS, JavaScript and PHP.
  • Performing frontend & backend rewrite of existing CMS system using php, jQuery, C#, JavaScript, Bootstrap and ASP.Net.
  • Analyzing sales of past few years using data mining tools like R Studio, Python and Hadoop.
  • Developing database which contains Tables, Stored Procedures, Functions, Views, Triggers and Indexes in SQL SERVER and connecting to existing CMS system.
  • Understating sales from past few years, using Data Science techniques.
  • Created sales summary of past years using Visualization techniques with Excel, Power BI, Tableau.
  • Extracted data from third party application and conducted data preprocessing and data mining using R and Python.
  • Created models using R and Python packages.

Environment: Python, R/Rstudio, HDFS, Hadoop 2.3, AWS, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, Matlab, Power BI.

Confidential

Data Engineer

Responsibilities:

  • Used ETL tools such as Talend and Pentaho.
  • Extracted, Transformed and loaded data from given source to analysis.
  • Hands-on implementation of R, Python, Hadoop, Tableau and SAS to extract and import data.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Gained extreme knowledge on Map Reduce using Python, Sqoop queries, Pig scripts and Hive queries.
  • Good hands on experience on Amazon Redshift platform.
  • Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values.
  • Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behaviour prediction and support multiple marketing segmentation programs.
  • Explored different regression and ensemble models in machine learning to perform forecasting.
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Performed Boosting method on predicted model for the improve efficiency of the model.

Environment: R/ Rstudio, Informatica, SQL/PLSQL, Oracle 10g, MS-Office, Tableau.

Confidential

SQL developer / DBA

Roles & Responsibilities:

  • Participated in analysis, design, development, testing, and implementation of various financial Systems using Oracle, Developer and PL/SQL.
  • This system consists of the various functional modules.
  • Define database structure, mapping and transformation logic. Creation of External Table scripts for loading the data from source for ETL (Extracting Transforming and Loading) Jobs.
  • Wrote UNIX Shell Scripts to run database jobs on server side.
  • Developed new and modified existing packages, Database triggers, stored procedure and other code modules using PL/SQL in support of business requirements.
  • Worked with various functional experts to implement their functional knowledge into business rules in turn as working code modules like procedures and functions.
  • Used TOAD and SQL navigator extensively.

Environment:: Oracle, SQL, PL/SQL, UNIX.

We'd love your feedback!