Data Engineer Resume NC - Hire IT People

SUMMARY

Overall 3+ years of experience as Data Engineer including designing, Deep learning Specialization, Machine Learning and TensorFlow Developer.
Expertise in writing end to end Data processing Jobs to analyse data using MapReduce, Spark and Hive.
Experience with Apache Spark ecosystem using Spark - Core, SQL, Data Frames, RDD's and knowledge on Spark MLLib.
Experience with integration of Jira with third-party systems such as Service Now.
Extensive Knowledge on developing Spark Streaming jobs by developing RDD’s (Resilient Distributed Datasets) using Scala, PySpark and Spark-Shell.
Experienced in data manipulation using python for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations.
Experienced in using Pig scripts to do transformations, event joins filters and pre-aggregations before storing the data into HDFS.
Strong knowledge of Hive analytical functions, extending Hive functionality by writing custom UDFs.
Expertise in writing Map-Reduce Jobs in Python for processing large sets of structured semi-structured and unstructured data sets and stores them in HDFS.
Good understanding of data modelling (Dimensional & Relational) concepts like Star-Schema Modelling, Snowflake Schema Modelling, Fact and Dimension tables.
Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instance.
Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift, and EC2 for data processing.
Strong experience in working with Windows, Linux and Mac environments, writing shell scripts.
Experienced in working in SDLC, Agile and Waterfall Methodologies.
Strong skills in analytical, presentation, communication, problem solving with the ability to work independently as well as in a team and had the ability to follow the best practices and principles defined for the team.

TECHNICAL SKILLS

Programming Language: Python, R, MATLAB

Packages: SciPy, NumPy, Pandas, scikit-learn, matplotlib, NLTK, spacy, Keras, PySpark, Stanford NLP Stanza

NLP: Named Entity Recognition, POS tagging, Parsing, Vectorization, Tagging, Sentiment Analysis, Text Classification, Clustering, etc.

Database: MySQL, SQL

Frame Work: Machine Learning & Deep Learning (Keras, TensorFlow, PyTorch), WEKA, CNN, BERT, Transformers, Spacy

Project Management Tools: Jira, Git

Operating System: Window, Linux, Mac

Methodologies: Agile, Scrum, Waterfall

Cloud Technologies: Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure

Container tools: Docker

PROFESSIONAL EXPERIENCE

Confidential, NC

Data Engineer

Responsibilities:

Responsible for the execution ofbig data analytics, predictive analytics and machine learning initiatives.
Implemented a proof of concept deploying this product inAWS S3 bucket
Utilize AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
Worked on PySpark APIsfor data transformations.
Extending Hive and Pigcore functionality by writing custom UDF's for Data Analysis.
Upgraded currentL in uxversion to RHEL version 5.6
Expertise in hardening,Linux Server and Compiling, Building and installing Apache Server from sources with minimum modules
Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
Worked on developing Pyspark script to encrypting the raw data by using Hashing algorithms concepts on client specified columns.
Worked on the tuning of SQLQueries to bring down run time by working on Indexes and Execution Plan.
Created a validation set using Keras2DML in order to test whether the trained model was working as intended or not.
Defined multiple helper functions that are used while running the neural network in session. Also defined placeholders and number of neurons in each layer.
Created neural networks computational graph after defining weights and biases.
Created aTensorFlowsession which is used to run the neural network as well as validate the accuracy of the model on the validation set.
After executing the program and achieving acceptable validation accuracy a submission was created that is stored in the submission directory.
Executed multipleSpark SQLqueries after forming the Database to gather specific data corresponding to an image.

Confidential

Data Engineer, Co-founder

Responsibilities:

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Performed data analysis by using Hive to retrieve the data from SQL to retrieve data from Red Shift.
Explored and analysed the customer specific features by using Spark SQL.
Performed data imputation using Scikit-learn package in Python.
Responsible for ETL development with successful design, development, and integration of components within the Talend ETL Platform and Java Technology.
Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
Creating complex JIRA workflows including project workflows, custom fields, notification schemes, reports and dashboards in JIRA.
Migrate and Upgrade Jira from Oracle to PostgreSQL environments.
Worked in creating complex stored procedures,SSIS packages, triggers, cursors, tables, and viewsand other SQL joins and statements for applications.
Designed and implemented recommender systems which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.
Utilized natural language processing (NLP) techniques to Optimized Customer Satisfaction.
Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
Developing containment scripts for data reconciliation using SQL and Python.
Performed data analysis and data profiling using complex SQL on various sources systems including MySQL and Teradata.

We provide IT Staff Augmentation Services!

Data Engineer Resume

NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship