We provide IT Staff Augmentation Services!

Sr Big Data Engineer Resume

Redmond, WA


  • 7 Years Big Data Engineer Experience
  • An accomplished Big Data Engineer with over seven years of experience bringing a strong working knowledge of big data system architecture and ETL pipeline building techniques to bear on a variety of real - world business problems to yield lean, actionable results and insights for improvement.
  • A highly organized and efficient individual whose leadership and thorough, precise approach to projects has yielded excellent results.
  • Expert Python / Java / Scala developer specializing in developing and deploying big data solutions.
  • Always on top of the current trends in relevant technologies, shifts in the big data climate, and improvements in existing methodologies.
  • Strong leadership skills with specific experience in the Agile framework.
  • Ability to take data engineering beyond proof of concept stage and into full productions and deployment.
  • Extensive experience with third party cloud resources: AWS, Google Cloud, Azure
  • Expertise in all common data engineering techniques: A/B testing, Data Fusion and integration, data mining, machine learning, natural language processing, statistics.
  • Strong proficiency with Hadoop ecosystem, utilizing tools both on prem and on cloud platforms.
  • Proficiency with a variety of python and java libraries such as: Boto3, numpy, pandas, matplotlib, pySpark, DeepLearning4j, JSAT, MLlib, JDMP.
  • Experience in developing pipelines geared for scalability, performance, easy to maintain, and creating monitoring and alert systems.
  • Expertise in batch and real time processing, creating end to end pipeline solutions for various ecosystems including AWS, Azure, and on Prem platforms.


  • Python (8 years)
  • Java (6 years)
  • Scala (6 years)
  • Data Mining
  • ETL Pipelines
  • Fault tolerant system building
  • Cloud Development
  • Big Data Analytics
  • Communication & Leadership


Programming: Python, R, SQL, Scala, Java, JavaScript, Shell, MATLAB, C++

Libraries: Kafka-python, pySpark, numpy, Pandas, DL4J, ND4J, JSAT, JAVA-ML, MLlib, RankLib, Retina, JDMP, Encog, pymysql, boto3

Big Data Tools: Kafka, Spark, Storm, Cassandra, Flink, Cloudera, HortonWorks, HPCC, Qubole, Statwing, CouchDB, Pentaho, Openrefine, Rapidminer, Data Cleaner, Hive, MapReduce MongoDB, Flume, Elasticsearch, Hadoop, Xplenty, AWS Glue, Alooma, Talend, Stitch, Infosphere, Airflow, Kuburnetes, Neo4J, SAMOA, Openrefine, Zookeeper, Avro, Apex, SQL, PIG, Sqoop

Big Data Methods: Batch and Real time data pipelines, Lambda and step function architecture, author schedule and monitor workflows with DAGs (apache airflow), Data transformation, HTTP / MQTT endpoints, map-reduce batch compute, stream computations, machine learning frameworks, low latency data store, deployment

Data Visualization: Tableau, Matplotlib, Seaborn, Altair, ggplot2, Plotly

NLP: NLTK, Gensim, AWS Transcribe, Comprehend, Glove, SpaCy, OpenNLP, AllenNLP

Version Control: GitHub, Git, SVN, Mercurial, AWS CodeCommit, Azure DevOps Repos

IDE: Jupyter Notebook, PyCharm, Visual Studio, Spyder, Eclipse, Atom, IntelliJ IDEA

Big Data Ecosystems: Hadoop, SnowFlake, Oracle ExaData, Vertica, Teradata, Pivotal Greenplum, SAP IQ

SQL RDBMS: Microsoft SQL, MySQL, Oracle DB, AWS RDS, T-SQL, PostgreSQL, IBM DB2, Amazon Aurora, Azure SQL, MariaDB, SQLite, Microsoft Access

NoSQL ONDMs: PyMongo, HappyBase, Boto3 (DynamoDB), EclipseLink, Hibernate

NOSQL Database: MongoDB, Cassandra, Redis, HBase, Neo4j, Oracle NoSQL, Amazon DynamoDB, Couchbase, CouchDB



Confidential, Redmond, WA


  • Utilized Azure Kubernetes Services (AKS) for data ingestion clusters management
  • Worked with Azure Designer to design and upgrade existing data pipelines
  • Automated key end to end dataflow transformations and load balancing
  • Assisted in creation of multiple endpoint API’s for Cortana services
  • Created new API triggers using Azure Functions providing simple solutions for complex orchestration challenges
  • Transformed data sent to Azure SQL data warehouse for easy accessibility.
  • Management of docker containers via Kubernetes to ensure coordination of node clusters at scale in production.
  • Utilized Numpy, Pandas for exploratory data analysis
  • Used libraries NLTK, Gensim, Glove for NLP preprocessing and embedding
  • Utlized Apache Spark based Azure Databricks to ingest data with Azure Data Factory in batches and real time using Kafka.
  • Optimized dashboards on Power BI to ensure stable workflow and updated visualizations.
  • Lead a team of five to ensure proper work distribution and meeting project deadlines
  • Participated in daily scrum stand up meetings, presented my teams accomplishments and future goals
  • Utilized Ingress Controllers in Azure for route HTTP traffic to different applications
  • Made use of multiple cognitive API’s including speech, language, Bing Search, QnA services.
  • Optimization and redeployment of core and value add services surrounding Cortana on multiple platforms such as Windows, smartphones, Xbox console, Edge Browser, and VR headsets
  • Managed code repository using Git to ensure code integrity is stable at all times and ready to deploy


Confidential, Bloomington, IL


  • Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
  • Used AWS Kinesis for batch and real time streaming of data
  • Utilized Amazon Elastic MapReduce (EMR) for fast parallel computations
  • Created workflows with AWS Lambda and Step Functions for efficient pipeline flow
  • Worked in AWS Glue to create fully managed ETL pipelines for integration with Athena, Redshift and EMR
  • Worked with SQL and NOSQL databases such as RDS and MongoDB for different data pipelines including text and customer data.
  • Built Data virtualization layer (DENODO Base and Derived views), Data visualization using Tableau and accessed aggregations using SQL Clients PostgreSQL & SQL-Workbench.
  • Queried data from AWS RDS using Aurora Query Editor.
  • Collaborated with data science team and e-commerce team to successfully deploy and integrate the models.
  • Engineered an automated ETL pipeline for data ingestion and feature engineering using AWS Sagemaker.
  • Manage code repository using Git to ensure integrity of code base is maintained at all times
  • Used AWS tools such as Transcribe, Comprehend, Sagemaker, to update and improve framework of Phone Virtual Assistant.
  • Ensured system architecture met business requirements, constantly worked with different teams to ensure every aspect of architecture is beneficial to the company




  • Worked in a Cloudera Hadoop environment, utilizing apache tech stack
  • Utilized apache Kafka for data streaming sensor data from flight recorders
  • Transformed, mapped data utilizing Spark and MapReduce for parallel computation
  • Ran sensor data through several filters to eliminate noise for more accurate data modeling
  • Relational data stored into hive tables, which were easily queried by data scientists
  • Managed data flow with apache airflow, ensuring proper and efficient scheduling and task execution
  • Assisted data scientist in creating dashboard utilizing tableau for dynamic maintenance scheduler
  • Managed compute clusters using Kubernetes for efficient container orchestration
  • Used Jenkins for continuous integration automation, to ensure new flight recorder data streams can easily integrate with newly developed pipeline builds.
  • Worked in Agile Scrum environment, participating in daily scrum meetings and showcasing team contributions and accomplishments
  • Built data ingestion workflows using apache NiFi, Schema Registry, and spark streaming
  • Worked extensively with shell scripting to ensure proper execution with docker containers
  • Created data management policies, procedures and set new standards to be used in future development
  • Algorithm development on high performance systems, orchestrating workflows within a contained environment
  • Worked with end user to ensure transformation of data to knowledge in very focused and meaningful ways
  • Implemented and configured data pipelines was well as tuning processes for performance and scalability




  • Used the R package dplyr for data manipulation and analyzing
  • Maintained and contributed to many internal R packages used for building and diagnosing models, and automated reporting
  • Used R to perform ad-hoc analyses and deeper drill downs into spend categories of particular interest to clients on a project-to-project basis
  • Performed large data cleaning and preparation tasks using R and SQL to gather information from disparate and incompatible data sources from across a client’s entire enterprise to provide a complete view of all indirect spend
  • Helped to Maintain a large database of commodity and vendor information using SQL
  • Maintained various visualization tools and dashboards used to provide data-driven insights

Hire Now