Data Analyst Resume Princeton, NJ - Hire IT People

SUMMARY

Having 7 years of experience in a variety of industries including experience in Big Data Technologies (Apache Hadoop stack and Apache Spark) and experience in Python/Java and web technologies and ETL.
Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance
Experienced working with various Hadoop Distributions (Amazon EMR, Cloudera, Hortonworks, MapR) to fully implement and leverage new Hadoop features
Experience with SQL on Hadoop using different tools like Hive, Impala, Spark - SQL, Sqoop
Experience in developing Spark Applications using Spark RDD, Spark-SQL and Dataframe APIs
Worked with real-time data processing and streaming techniques using Spark streaming and Kafka
Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop
Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the Confidential queries
Significant experience writing custom UDF’s in Hive and custom Input Formats in MapReduce
Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing
Validate the Data by using the Py-Spark programs
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
Strong understanding of real time streaming technologies Spark and Kafka
Knowledge of job work flow management and coordinating tools like Oozie
Strong experience building end to end data pipelines on Hadoop platform
Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase
Strong understanding of Logical and Physical data base models and entity-relationship modeling
Experience with Software development tools such as JIRA, Play, GIT, Bitbucket, Bamboo
Good understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data
Strong understanding of Java Virtual Machines and multi-threading process
Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD)

TECHNICAL SKILLS

Programming Languages: Py-Spark, Python, SQL, Shell/Bash, Java, Spark SQL, Hive, R, C, C++

Internet Technologies: JavaScript, Chart.js, D3.js, HTML5, CSS3, PHP, Bootstrap, Angular, Rest API’s, Airflow

Databases: Hive (DW), MySQL, MongoDB, Cassandra, PostgreSQL, Redshift (DW)

IDEs/ Development tools: Jupyter Notebook, POSTMAN, IntelliJ, Eclipse- Java EE, GitHub, MongoDB Compass, Tableau

Platform: Linux, Ubuntu, OSX, Windows

PROFESSIONAL EXPERIENCE

Confidential, Paoli, PA

Data Engineer

Responsibilities:

Working in legal compliance team within financial services, day to day work includes large scale financial data management, strategizing
And implementing efficient data architecture for financial crime detection teams, performing scalable batch/stream data processing.
Responsible for migrating, ingesting and transforming large scale raw transactional datasets to standardized scalable data products for FC teams.

Technologies used: Py-Spark, Python, Bash/Shell, SQL, Hadoop, Py-Spark, Spark-SQL, Splunk, Hive, MapReduce, Sqoop, Flume, AWS, EMR, S3, EC2, Hue, Tableau

Confidential, Princeton, NJ

Data Engineer

Responsibilities:

Creating Data Pipelines, strategizing and implementing Micro-Service based data infrastructure, REST API’s, scraping raw web contents, storing on cloud
Ensuring efficient data management to reduce cost, writing ML models for client focused project solutions, participating in whole project lifecycle.
Generated client facing reports, created visualization using Plotly, Tableau, worked on Big Data technologies, Python, Beautiful Soup, REST Web Services, S3, EC2, EMR AWS, MS Azure, Flask, MySQL, etc
Worked on REST API’s, Scraping and Crawling large web data, Data Cleaning, Data Pre-processing, creating visualizations, performing Machine Learning and implementing data pipeline.
Worked on Big Data technologies on Hadoop ecosystem.

Technologies used: Python, Beautiful Soup, requests, REST API’s, MySQL, Spark, Hive, Map Reduce, S3, EC2

Confidential

Responsibilities:

Asynchronous text scraping thousands of websites
Implemented parallelized data processing operations using Dask framework to clean and filter text data
Implemented ML algorithms to extract accurate needed informations on scale.
Performed contact sourcing ML based optimizers to retrieve client focused required results and tagging searches.

Technologies Used: Python, Async.io, Dask, BeautifulSoup, Requests, Json, selenium, scrapy, matplotlib, pandas, AWS, MongoDB, XGBoost, NLP, NER, Py-spark

Confidential

Data Analyst

Responsibilities:

Creating structured data pipeline with 40+ integrations of various data sources to filter, transform and validate the inflow of raw data.
Performed Data Cleaning and Preprocessing, transformations and performing predictive modelling.
Targeted analysis of sales and customer acquisitions.
Target was to find key insights and opportunities designated to leverage the data intelligently, thus improving customer targeting and over data value to increase sales.
Performed RFM analysis, customer-churn predictions, recommendation system, association rule mining, data enrichment and quality improvement.

Technologies used: Python, GraphLab, numpy, pandas, scikit-learn, tensorflow, keras, Tableau, Chart.js, D3.js

Confidential

Responsibilities:

Developed robust machine learning models for cryptocurrency direction movement.
Instrumental in creating infrastructure for complete pipeline for the project.
Provided framework for identifying key features for stacked models.
Identified key features for direction movement useful for day traders.

Technologies used: Generative & Discriminative Models, Python, MongoDB, Neural Network, Bitcoin, Quandl

Confidential

Data Engineer

Responsibilities:

Participated in all phases of project life cycle including data collection, data mining, and data cleaning, developing models, validation, and creating reports.
Implemented business intelligence dashboards using Tableau producing different summary results based on requirements and role members.
Utilized MapReduce and PySpark programs to process data for analysis reports.
Worked on data cleaning and ensured data quality, consistency, and integrity using Pandas and Numpy.
Performed data preprocessing on messy data including imputation, normalization, scaling, and feature engineering etc., using Scikit-Learn.
Conducted exploratory data analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlations between features.
Built classification models based on Logistic Regression, Decision Trees, Random Forest Support Vector Machine, and Ensemble algorithms to predict the probability of absence of patients.
Used various metrics such as F-Score, ROC, and AUC to evaluate the performance of each model and K -fold cross-validation to test the models with different batches of data to optimize the models.
Implemented and tested the model on AWS EC2 and collaborated with development team to get the best algorithm and parameters.
Performed data visualization, designed dashboards with Tableau, and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders.

Environment: Microsoft SQL Server, SQL Server Management Studio, T-SQL, MLlib, MapReduce, Python, JIRA, AWS, and Tableau.

Confidential

Jr. Data Scientist

Responsibilities:

Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
Responsible for design and development of advanced R/Python programs to prepare transform and harmonizedatasets in preparation for modelling.
Designed and automated the process of score cuts that achieve increased close and good rates using advanced R programming
Utilized standard Python modules such as csv, itertools and pickle for development.
Analysed large datasets to answer business questions by generating reports and outcome.
Worked in a team of programmers anddataanalysts to develop insightful deliverables that supportdata-driven marketing strategies.
Executed SQL queries from R/Python on complex table configurations.
Retrievingdatafrom database through SQL as per business requirements.
Create, maintain, modify and optimize SQL Server databases.
Manipulation ofDatausing python Programming.
Adhering to best practices for project support and documentation.
Understanding the business problem, build the hypothesis and validate the same using thedata.
Managing the Reporting/Dash boarding for the Key metrics of the business.
Involved indataanalysis with using different analytic techniques and modeling techniques.

Environment: R, Python, SQL, exploratory analysis, feature engineering, Machine Learning, Python (NumPy, SciPy, pandas, scikit-learn, NLTK, NLP), Tableau.

We provide IT Staff Augmentation Services!

Data Analyst Resume

Princeton, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship