Data Engineer Resume
NJ
SUMMARY
- Highly experienced Data Engineer with Around 2 years experience inData Extraction, Data Modelling, Data Wrangling, Statistical Modeling,Data Mining, Machine LearningandData Visualization.
- Comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive).
- Extensively worked on Spark for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
- Excellent Programming skills at a higher level of abstraction using Python.
- Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Experienced in working with in - memory processing framework like Spark Transformations, SparkQL, MLib and Spark Streaming.
- Proficient inMachine Learning algorithmandPredictive modelingincludingRegression Models,Decision Tree.
- Proficient inStatistical MethodologiesincludingHypothetical Testing,ANOVA,Time Series,Principal Component Analysis,Factor Analysis,Cluster Analysis,Discriminant Analysis.
- Knowledge onNatural Language Processing (NLP)algorithm andText Mining.
- Worked inlarge scale databaseenvironment likeHadoopandMap Reduce, with workingmechanismof Hadoop clusters, nodes andHadoop Distributed File System (HDFS).
- Strong experience withPython (2.x,3.x)to developanalytic modelsandsolutions.
- Working experience inHadoop ecosystemandApache Spark frameworksuch asHDFS,Map Reduce,HiveQL,Spark SQL, PySpark.
- Proficient in data visualization tools such asTableau,Python Matplotlib,R Shinyto create visually powerful and actionable interactive reports and dashboards.
- ExcellentTableau Developer, expertise in building, publishing customized interactivereportsanddashboardswith customized parameters and user - filters usingTableau(9.x/10.x).
- Experienced inAgilemethodology andSCRUMprocess.
- Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.
TECHNICAL SKILLS
Languages: Python, R, C++, Java, SQL, JavaScript, HTML, CSS
Databases: MySQL, SQL Server, MongoDB
Tools: Tableau, SAS, Rapid Miner, TensorFlow, MapReduce/Hadoop, Power BI, JIRA, GitHub, Scikit-learn, AWS
Data Analytics &Machine Learning: Linear Regression, Logistic Regression, K-Mean, KNN, Decision Trees, Cluster Analysis, Neural Networks, Naïve Bayes
Statistical Tools: Time Series, Regression models, principal component analysis, Hypothesis Testing, A/B Testing, Confidence Intervals, T-test, ANOVA, NLP
Operating System: Windows, Linux, Mac
Methodologies: Agile, SDLC, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, NJ
Data Engineer
Responsibilities:
- Utilized the Waterfall methodology of development for Requirements, Planning, Design, and Deployment.
- Worked on data cleaning and ensure data quality, consistency, and integrity using NumPy, SciPy, and Pandas.
- Worked in using collections in Python for manipulating and looping through different user-defined objects.
- Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
- UtilizedODBC for connectivity to Teradata via MS Excel to retrieve automatically from Teradata Database.
- Data Analysis, logical, physical, and Data Modeling for OLAP systems.
- Designed and deployed reports with Drill Down, Drill Through, and Drop-down menu option and Parameterized and Linked reports using Tableau.
- Proficient with Python 3.x including Numpy, Scikit-learn, NLP, Pandas, Matplotlib and Seaborn.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Data warehousing principles like Fact Tables, Dimensional Tables, Dimensional Data Modelling - Star Schema, and Snow Flake Schema.
- Ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server and MySQL.
- Data validation and cleansing of staged input records was performed before loading into Data Warehouse.
- Holding a knowledge working with Source Tree with GitHub.
Confidential, NJ
Data Engineer (Intern)
Responsibilities:
- Experienced working in Agile Methodologies.
- Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Red Shift.
- Explored and analysed the customer specific features by using Spark SQL.
- Performed data imputation using Scikit-learn package in Python.
- Responsible for ETL development with successful design, development, and integration of components within the Talend ETL Platform and Java Technology.
- Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
- Worked in creating complex stored procedures,SSIS packages, triggers, cursors, tables, and viewsand other SQL joins and statements for applications.
- Developed Tableau data visualization using Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
- Designed and implemented recommender systems which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.
- Utilized natural language processing (NLP) techniques to Optimized Customer Satisfaction.
- Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
Confidential
Data Engineer
Responsibilities:
- Understand the data visualization requirements from the Business Users.
- Writing SQL queries to extract data from the Sales data marts as per the requirements.
- Developed Tableau data visualization using Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
- Tuning Oracle databases and tuning applications (SQL), for tuning SQL working closely with development Team.
- Developing containment scripts for data reconciliation using SQL and Python.
- Performed data analysis and data profiling using complex SQL on various sources systems including MySQL and Teradata.
- Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica)
- Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
- Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.