Data Engineer Resume NJ - Hire IT People

SUMMARY

Highly experienced Data Engineer with Around 2 years experience inData Extraction, Data Modelling, Data Wrangling, Statistical Modeling,Data Mining, Machine LearningandData Visualization.
Comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive).
Extensively worked on Spark for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
Excellent Programming skills at a higher level of abstraction using Python.
Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
Experienced in working with in - memory processing framework like Spark Transformations, SparkQL, MLib and Spark Streaming.
Proficient inMachine Learning algorithmandPredictive modelingincludingRegression Models,Decision Tree.
Proficient inStatistical MethodologiesincludingHypothetical Testing,ANOVA,Time Series,Principal Component Analysis,Factor Analysis,Cluster Analysis,Discriminant Analysis.
Knowledge onNatural Language Processing (NLP)algorithm andText Mining.
Worked inlarge scale databaseenvironment likeHadoopandMap Reduce, with workingmechanismof Hadoop clusters, nodes andHadoop Distributed File System (HDFS).
Strong experience withPython (2.x,3.x)to developanalytic modelsandsolutions.
Working experience inHadoop ecosystemandApache Spark frameworksuch asHDFS,Map Reduce,HiveQL,Spark SQL, PySpark.
Proficient in data visualization tools such asTableau,Python Matplotlib,R Shinyto create visually powerful and actionable interactive reports and dashboards.
ExcellentTableau Developer, expertise in building, publishing customized interactivereportsanddashboardswith customized parameters and user - filters usingTableau(9.x/10.x).
Experienced inAgilemethodology andSCRUMprocess.
Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

TECHNICAL SKILLS

Languages: Python, R, C++, Java, SQL, JavaScript, HTML, CSS

Databases: MySQL, SQL Server, MongoDB

Tools: Tableau, SAS, Rapid Miner, TensorFlow, MapReduce/Hadoop, Power BI, JIRA, GitHub, Scikit-learn, AWS

Data Analytics &Machine Learning: Linear Regression, Logistic Regression, K-Mean, KNN, Decision Trees, Cluster Analysis, Neural Networks, Naïve Bayes

Statistical Tools: Time Series, Regression models, principal component analysis, Hypothesis Testing, A/B Testing, Confidence Intervals, T-test, ANOVA, NLP

Operating System: Windows, Linux, Mac

Methodologies: Agile, SDLC, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, NJ

Data Engineer

Responsibilities:

Utilized the Waterfall methodology of development for Requirements, Planning, Design, and Deployment.
Worked on data cleaning and ensure data quality, consistency, and integrity using NumPy, SciPy, and Pandas.
Worked in using collections in Python for manipulating and looping through different user-defined objects.
Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
UtilizedODBC for connectivity to Teradata via MS Excel to retrieve automatically from Teradata Database.
Data Analysis, logical, physical, and Data Modeling for OLAP systems.
Designed and deployed reports with Drill Down, Drill Through, and Drop-down menu option and Parameterized and Linked reports using Tableau.
Proficient with Python 3.x including Numpy, Scikit-learn, NLP, Pandas, Matplotlib and Seaborn.
Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
Data warehousing principles like Fact Tables, Dimensional Tables, Dimensional Data Modelling - Star Schema, and Snow Flake Schema.
Ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server and MySQL.
Data validation and cleansing of staged input records was performed before loading into Data Warehouse.
Holding a knowledge working with Source Tree with GitHub.

Confidential, NJ

Data Engineer (Intern)

Responsibilities:

Experienced working in Agile Methodologies.
Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Red Shift.
Explored and analysed the customer specific features by using Spark SQL.
Performed data imputation using Scikit-learn package in Python.
Responsible for ETL development with successful design, development, and integration of components within the Talend ETL Platform and Java Technology.
Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
Worked in creating complex stored procedures,SSIS packages, triggers, cursors, tables, and viewsand other SQL joins and statements for applications.
Developed Tableau data visualization using Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
Designed and implemented recommender systems which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.
Utilized natural language processing (NLP) techniques to Optimized Customer Satisfaction.
Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Confidential

Data Engineer

Responsibilities:

Understand the data visualization requirements from the Business Users.
Writing SQL queries to extract data from the Sales data marts as per the requirements.
Developed Tableau data visualization using Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
Tuning Oracle databases and tuning applications (SQL), for tuning SQL working closely with development Team.
Developing containment scripts for data reconciliation using SQL and Python.
Performed data analysis and data profiling using complex SQL on various sources systems including MySQL and Teradata.
Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica)
Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.

We provide IT Staff Augmentation Services!

Data Engineer Resume

NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship