We provide IT Staff Augmentation Services!

Data Engineer/big Data Engineer Resume

3.00/5 (Submit Your Rating)

Minneapolis, MN

PROFESSIONAL SUMMARY:

  • A Passionate, team - oriented Data Scientist, Data Engineer, Data Analyst, Visualization Visualization Specialist(Tableau, PowerBI) with over 6+ years of experiencein Data Extraction, Data Modelling, Statistical Modeling, Data Mining, Machine Learning, Data Visualization, Design, Development, Integration, Implementation and Maintenance of Business Intelligence and Data Warehousing Platforms.
  • Expertise in transforming business resources and tasks into regularized data and analytical models, designing algorithms, developing data mining and reporting solutions across a massive volume of structured and unstructured data.
  • Extensive experience in Supervised and Un-Supervised Machine Learning solutions to various business problems and generating data visualizations using Python.
  • Extensive experience in Machine Learning solutions to various business problems and generating data visualizations using Python.
  • Proficient in working over Pandas, NumPy, Scikit-learn in Python for developing various machine learning models.
  • Proficient at Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Neural Networks, Random Forest, Ensemble Models, SVM, KNN, XG-Boost and various Clustering approach.
  • Solid knowledge and experience in Deep Learning techniques including Feedforward Neural Network, Convolutional Neural Network (CNN), Recursive Neural Network (RNN), pooling, regularization.
  • Excellent understanding of best practices of Enterprise Data Warehouse and involved in Full life cycle development of Data Warehousing.
  • Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
  • Excellent proficiency in model validation and optimization with Model selection, Parameter/Hyper-Parameter tuning, K-fold cross validation, Hypothesis Testing, Principle Component Analysis (PCA).
  • We implemented and analyzed RNN based approaches for automatically predicting implicit relations in text. The disclosure relation has potential applications in NLP tasks like Text Parsing, Text Analytics, Text Summarization, Conversational systems.
  • Worked on Gradient Boosting decision trees with XGBoost to improve performance and accuracy in solving problems.
  • Worked with numerous data visualization tools in python like matplotlib, seaborn, ggplot, pygal
  • Experience in designing visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.
  • Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and using Project Management tools like ProjectLibre, Jira/Confluence and version control tools such as Github.
  • Worked with NoSQL Database including HBase, Cassandra and MongoDB.
  • Hands-on experience with AWS Glue and AWS Athena,data-specific file formats (ORC, Parquet, etc).
  • Experienced with translating business requirements into effective AWS data designs and data ingestion patterns.
  • Specifically experienced in metadata tagging and metadata tagging solutions involving files and data within files
  • Experience working in Image Processing projects using OpenCV/Python/MATLAB.
  • Knowledge of OpenCV algorithms such as edge detection, histogram, morphology etc.
  • Experience in Image Classification, Object detection using CNN, Deep Learning - Tensorflow, Pytorch

TECHNICAL SKILLS:

Programming Languages: Python, R language, H2O.ai, H2O FLOW, Bash scripting, SAS Base, SAS Enterprise Miner, Regular Expressions and SQL (Oracle, MySQL & SQL Server).

Packagesand tools: Keras, TensorFLow, Pandas, NumPy, SciPy, Scikit-Learn, NLTK, matplotlib, Seaborn, Bokeh, ggplot2, dplyr, data.table, H2o.ai., SparkR and PySpark, OpenCV, MATLAB

Machine Learning: Linear Regression, Logistic Regression, Multinomial logistic regression, Regularization (Lasso & Ridge), Decision trees, Support Vector Machines, Ensembles - Random Forest, Gradient Boosting, Xtreme Gradient Boosting(xGBM), Time series forecasting( ARIMA, Exponential Smoothing methods), Dimensionality Reduction- Principal Component Analysis(PCA), LDA, Weight of Evidence (WOE) and Information Value, Hierarchical & K-means clustering, K-Nearest Neighbors and A/B testing.

Good knowledge on: Neural Networks, Deep Neural Networks (CNN & RNN) and LSTM.

Business Intelligence: Tableau, Qlik Sense,Google Cloud Data Studio, Advanced Microsoft Excel and Power BI.

Big Data Tools: Spark/ PySpark, Hive, Impala, HUE, Map Reduce, HDFS, Sqoop, Flume,Kafka and Oozie

Text Mining: Text Pre-Processing, Information Retrieval, Classification, Topic Modeling, Text Clustering, Sentiment Analysis and Non-negative Matrix Factorization (NMF)).

Cloud Technologies: Google Cloud Platform Big Data & Machine Learning modules - Cloud Storage, Cloud DataFlow, Cloud ML, BigQuery, Cloud Dataproc, Cloud Datastore, BigTable. Familiarity on AWS - EMR, EC2, S3.

Version Control: Git

WORKEXPERIENCE:

Confidential, Minneapolis, MN

Data Engineer/Big Data Engineer

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed PySpark modules for machine learning & predictive analytics via Python streaming.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Used Pandas,NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
  • Spearheaded chatbot development initiative to improve customer interaction with application.
  • Developed the chatbot using api.ai.
  • Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.
  • Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build a solution that optimize the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
  • Developed MapReduce/Spark Python modules for machine learning &predictive analytics in Hadoop on AWS.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like TeradataViewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential, Jersey City, NJ

Data Engineer/Data Analyst

Responsibilities:

  • Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.
  • Coded R functions to interface with Caffe Deep Learning Framework.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe NLPFramework.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Built reporting data warehouse from ERP system using Order Management, Invoice & Service contracts modules.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7.
  • Acted as SME for Data Warehouse related processes.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
  • Worked closely with Business Analyst and report developers in writing the source to target specifications for Data warehouse tables based on the business requirement needs.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, Power BI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, Naive Bayes.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
  • Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, and Business Objects.

Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, San Francisco, CA

Business Intelligence (BI) Developer/Tableau Developer

Responsibilities:

  • Meeting Business stake holders to understand business requirement and ways to provide solutions through Tableau visualizations.
  • Responsible for Tableau Environment, Security, Architecture setup.
  • Involved in Tableau Server Installation and Configuration, creating users, groups, and projects and assigning access levels.
  • Worked on Proof of Concepts to decide Tableau as Confidential BI strategy for enterprise reporting.
  • Created technical specification documents related to Tableau design and architecture.
  • Created several Tableau data extracts with prebuilt calculations, parameters and sets to enable end users to easily utilize the extracts as semantic layer to create their own workbooks.
  • Created various highly interactive dashboards and data visualizations reports for Ad Sales and Program Finance departments.
  • Developed multiple reports/dashboards using Dual Axes charts, Histograms, filled map, Bubble chart, Bar chart, Line chart, Tree map, Box and Whisker Plot, Stacked Bar etc.
  • Performed tuning of tableau workbooks and extracts to make them functional with large sets of data.
  • Created responsive Dashboards for Mobile/Ipad /PC orientation views.
  • Actively participating in the requirement gathering sessions. Understanding the source system (Connect Suite) data structure and planning on transformation of that data to target Azure Sql server.
  • As a Tableau Subject Matter Expert, regularly meeting with Tableau Developer and User Community to understand the concerns and requirements and provide required solutions and guidance.
  • Reviewing Interactive Data Source (IDS) table structures and redesigning tables and views as required.
  • Implemented Okta security for external/internal users and documented the process.
  • Implemented Tableau performance benchmarks based on Tableau best practices business requirements.
  • Developed Power BI model used for financial reporting.
  • Expertise in writing complex DAX functions in Power BI and Power Pivot.

Environment: Python 2.7, Cassandra, MySQL, LDAP, Git, Linux, Windows, JSON, JQuery, HTML, XML, CSS, REST, Rally, Bootstrap, JavaScript, Angular JS, Agile, Bitbucket, Py Unit,, Microsoft SQL server management studio, DataStax DevCenter, Apache Directory Studio, Ansible, Jenkins, Matplotlib, MOCK, Beautiful Soup, PyTest, Tableau, PowerBI

Confidential

Python Developer

Responsibilities:

  • Worked on the project from gathering requirements to developing the entire application.
  • Worked on AnacondaPython Environment.
  • Created, activated and programmed in Anaconda environment.
  • Wrote programs for performance calculations using NumPy and SQL Alchemy.
  • Wrote python routines to log into the websites and fetch data for selected options.
  • Used python modules of Urllib, urllib2, Requests for web crawling.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data mining solutions to various business problems and generating data visualizations using R, Python and Tableau. Used with other packages such as Beautiful Soup for data parsing.
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format. Used with other packages such as Beautiful Soup for data parsing
  • Worked on development of SQL and stored procedures on MYSQL.
  • Analyzed the code completely and have reduced the code redundancy to the optimal level.
  • Design and build a text classification application using different text classification models.
  • Used Jira for defect tracking and project management.
  • Worked on writing and as well as read data from CSV and excel file formats.
  • Involved in Sprintplanning sessions and participated in the daily AgileSCRUM meetings.
  • Conducted every day scrum as part of the SCRUM Master role.
  • Developed the project in Linux environment.
  • Worked on resulting reports of the application.
  • Performed QA testing on the application.
  • Held meetings with client and worked for the entire project with limited help from the client.

Environment: Python, Anaconda, Spyder (IDE), Windows 7, Teradata, Requests, urllib, urllib2, Beautiful Soup, Tableau, python libraries such as NumPy, SQL Alchemy, MySQL

We'd love your feedback!