SR BIG DATA ENGINEER Resume Redmond, WA - Hire IT People

SUMMARY:

7 Years Big Data Engineer Experience
An accomplished Big Data Engineer with over seven years of experience bringing a strong working knowledge of big data system architecture and ETL pipeline building techniques to bear on a variety of real - world business problems to yield lean, actionable results and insights for improvement.
A highly organized and efficient individual whose leadership and thorough, precise approach to projects has yielded excellent results.
Expert Python / Java / Scala developer specializing in developing and deploying big data solutions.
Always on top of the current trends in relevant technologies, shifts in the big data climate, and improvements in existing methodologies.
Strong leadership skills with specific experience in the Agile framework.
Ability to take data engineering beyond proof of concept stage and into full productions and deployment.
Extensive experience with third party cloud resources: AWS, Google Cloud, Azure
Expertise in all common data engineering techniques: A/B testing, Data Fusion and integration, data mining, machine learning, natural language processing, statistics.
Strong proficiency with Hadoop ecosystem, utilizing tools both on prem and on cloud platforms.
Proficiency with a variety of python and java libraries such as: Boto3, numpy, pandas, matplotlib, pySpark, DeepLearning4j, JSAT, MLlib, JDMP.
Experience in developing pipelines geared for scalability, performance, easy to maintain, and creating monitoring and alert systems.
Expertise in batch and real time processing, creating end to end pipeline solutions for various ecosystems including AWS, Azure, and on Prem platforms.

TOP SKILLS:

Python (8 years)
Java (6 years)
Scala (6 years)
Data Mining
ETL Pipelines
Fault tolerant system building
Cloud Development
Big Data Analytics
Communication & Leadership

TECHNICAL SKILLS

Programming: Python, R, SQL, Scala, Java, JavaScript, Shell, MATLAB, C++

Libraries: Kafka-python, pySpark, numpy, Pandas, DL4J, ND4J, JSAT, JAVA-ML, MLlib, RankLib, Retina, JDMP, Encog, pymysql, boto3

Big Data Tools: Kafka, Spark, Storm, Cassandra, Flink, Cloudera, HortonWorks, HPCC, Qubole, Statwing, CouchDB, Pentaho, Openrefine, Rapidminer, Data Cleaner, Hive, MapReduce MongoDB, Flume, Elasticsearch, Hadoop, Xplenty, AWS Glue, Alooma, Talend, Stitch, Infosphere, Airflow, Kuburnetes, Neo4J, SAMOA, Openrefine, Zookeeper, Avro, Apex, SQL, PIG, Sqoop

Big Data Methods: Batch and Real time data pipelines, Lambda and step function architecture, author schedule and monitor workflows with DAGs (apache airflow), Data transformation, HTTP / MQTT endpoints, map-reduce batch compute, stream computations, machine learning frameworks, low latency data store, deployment

Data Visualization: Tableau, Matplotlib, Seaborn, Altair, ggplot2, Plotly

NLP: NLTK, Gensim, AWS Transcribe, Comprehend, Glove, SpaCy, OpenNLP, AllenNLP

Version Control: GitHub, Git, SVN, Mercurial, AWS CodeCommit, Azure DevOps Repos

IDE: Jupyter Notebook, PyCharm, Visual Studio, Spyder, Eclipse, Atom, IntelliJ IDEA

Big Data Ecosystems: Hadoop, SnowFlake, Oracle ExaData, Vertica, Teradata, Pivotal Greenplum, SAP IQ

SQL RDBMS: Microsoft SQL, MySQL, Oracle DB, AWS RDS, T-SQL, PostgreSQL, IBM DB2, Amazon Aurora, Azure SQL, MariaDB, SQLite, Microsoft Access

NoSQL ONDMs: PyMongo, HappyBase, Boto3 (DynamoDB), EclipseLink, Hibernate

NOSQL Database: MongoDB, Cassandra, Redis, HBase, Neo4j, Oracle NoSQL, Amazon DynamoDB, Couchbase, CouchDB

PROFESSIONAL EXPERIENCE:

SR BIG DATA ENGINEER

Confidential, Redmond, WA

Responsibilities:

Utilized Azure Kubernetes Services (AKS) for data ingestion clusters management
Worked with Azure Designer to design and upgrade existing data pipelines
Automated key end to end dataflow transformations and load balancing
Assisted in creation of multiple endpoint API’s for Cortana services
Created new API triggers using Azure Functions providing simple solutions for complex orchestration challenges
Transformed data sent to Azure SQL data warehouse for easy accessibility.
Management of docker containers via Kubernetes to ensure coordination of node clusters at scale in production.
Utilized Numpy, Pandas for exploratory data analysis
Used libraries NLTK, Gensim, Glove for NLP preprocessing and embedding
Utlized Apache Spark based Azure Databricks to ingest data with Azure Data Factory in batches and real time using Kafka.
Optimized dashboards on Power BI to ensure stable workflow and updated visualizations.
Lead a team of five to ensure proper work distribution and meeting project deadlines
Participated in daily scrum stand up meetings, presented my teams accomplishments and future goals
Utilized Ingress Controllers in Azure for route HTTP traffic to different applications
Made use of multiple cognitive API’s including speech, language, Bing Search, QnA services.
Optimization and redeployment of core and value add services surrounding Cortana on multiple platforms such as Windows, smartphones, Xbox console, Edge Browser, and VR headsets
Managed code repository using Git to ensure code integrity is stable at all times and ready to deploy

BIG DATA ENGINEER

Confidential, Bloomington, IL

Responsibilities:

Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
Used AWS Kinesis for batch and real time streaming of data
Utilized Amazon Elastic MapReduce (EMR) for fast parallel computations
Created workflows with AWS Lambda and Step Functions for efficient pipeline flow
Worked in AWS Glue to create fully managed ETL pipelines for integration with Athena, Redshift and EMR
Worked with SQL and NOSQL databases such as RDS and MongoDB for different data pipelines including text and customer data.
Built Data virtualization layer (DENODO Base and Derived views), Data visualization using Tableau and accessed aggregations using SQL Clients PostgreSQL & SQL-Workbench.
Queried data from AWS RDS using Aurora Query Editor.
Collaborated with data science team and e-commerce team to successfully deploy and integrate the models.
Engineered an automated ETL pipeline for data ingestion and feature engineering using AWS Sagemaker.
Manage code repository using Git to ensure integrity of code base is maintained at all times
Used AWS tools such as Transcribe, Comprehend, Sagemaker, to update and improve framework of Phone Virtual Assistant.
Ensured system architecture met business requirements, constantly worked with different teams to ensure every aspect of architecture is beneficial to the company

BIG DATA ENGINEER

Confidential

Responsibilities:

Worked in a Cloudera Hadoop environment, utilizing apache tech stack
Utilized apache Kafka for data streaming sensor data from flight recorders
Transformed, mapped data utilizing Spark and MapReduce for parallel computation
Ran sensor data through several filters to eliminate noise for more accurate data modeling
Relational data stored into hive tables, which were easily queried by data scientists
Managed data flow with apache airflow, ensuring proper and efficient scheduling and task execution
Assisted data scientist in creating dashboard utilizing tableau for dynamic maintenance scheduler
Managed compute clusters using Kubernetes for efficient container orchestration
Used Jenkins for continuous integration automation, to ensure new flight recorder data streams can easily integrate with newly developed pipeline builds.
Worked in Agile Scrum environment, participating in daily scrum meetings and showcasing team contributions and accomplishments
Built data ingestion workflows using apache NiFi, Schema Registry, and spark streaming
Worked extensively with shell scripting to ensure proper execution with docker containers
Created data management policies, procedures and set new standards to be used in future development
Algorithm development on high performance systems, orchestrating workflows within a contained environment
Worked with end user to ensure transformation of data to knowledge in very focused and meaningful ways
Implemented and configured data pipelines was well as tuning processes for performance and scalability

DATA ANALYST

Confidential

Responsibilities:

Used the R package dplyr for data manipulation and analyzing
Maintained and contributed to many internal R packages used for building and diagnosing models, and automated reporting
Used R to perform ad-hoc analyses and deeper drill downs into spend categories of particular interest to clients on a project-to-project basis
Performed large data cleaning and preparation tasks using R and SQL to gather information from disparate and incompatible data sources from across a client’s entire enterprise to provide a complete view of all indirect spend
Helped to Maintain a large database of commodity and vendor information using SQL
Maintained various visualization tools and dashboards used to provide data-driven insights

We provide IT Staff Augmentation Services!

Sr Big Data Engineer Resume

Redmond, WA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship