- A data scientist professional with 6 years of experience in Data Analytics, Statistical Modeling, Visualization and Machine Learning. Excellent capability in collaboration, quick learning and adaptation.
- Experience in Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
- Wrote python scripts to parse XML documents and load the data in database and developed web - based applications using Python, CSS and HTML.
- Worked on applications and developed them with XML, JSON, XSL (PHP, Django, Python, Rails).
- Experienced in developing Web Services with Python programming language.
- Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL database.
- Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with Excel macros and python libraries.
- Experienced in working with various Python IDE's using PyCharm, PyScripter, Spyder, PyStudio and PyDev.
- Good experience of software development in Python and IDEs: pycharm, sublime text, Jupyter Notebook.
- Experienced in web applications development using Django/Python using HTML/CSS for server-side rendered applications.
- Hands on experience working in WAMP (Windows, Apache, MYSQL, and Python/PHP) and LAMP (Linux, Apache, My SQL, and Python/PHP) Architecture.
- Worked on Anaconda Python Environment.
- Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.
- Experience in implementing python alongside using various libraries such as mat plotlib for charts and graphs, MySQL db for database connectivity, python-twitter, PySide, Pickle, Pandas data frame, network, urllib2.
- Experienced in using python libraries like BeautifulSoup, NumPy, SciPy, matplotlib, Python-twitter, NetworkX, urllib2, MySQLdb for database connectivity and IDEs - Sublime Text, Spyder, PyCharm.
- Experienced in Requirement gathering, Use Case development, Business Process flow, Business Process Modeling.
- Hands-on experience on Python and libraries like Numpy, Pandas, Matplotlib, Seaborn, NLTK, Sci-Kit learn, SciPy.
- Good Exposure in deep learning with Tensor flow in python.
- Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.
- Good knowledge in Tableau, Power BI for interactive data visualizations.
- In-depth Understanding in NoSQL databases like MongoDB, HBase.
- Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
- Experience and Knowledge in developing software using Java, C++ (Data Structures and Algorithms) technologies.
- Good exposure in creating pivot tables and charts in Excel.
- Experience in developing Custom Report and different types of Tabular Reports, Matrix Reports, Ad hoc reports and distributed reports in multiple formats using SQL Server Reporting Services (SSRS).
- Excellent Database administration (DBA) skills including user authorizations, Database creation, Tables, indexes and backup creation.
Languages: Java 8, Python, R
Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, Seaborn, sciPy, matplot lib, sci-kit-learn, Beautiful Soup, Rpy2.
Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL
Data Modelling Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer
Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka SQL, Databases SQL, Server, My SQL, MS Access
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.
ETL Tools: Informatica Power Centre, SSIS.
Version Control Tools: SVM, GitHub
BI Tools: Tableau, Tableau Server, Tableau Reader, Amazon Redshift
Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat
Confidential - New Jersey
Sr .Python Developer
- Implemented Data Exploration to analyze patterns and to select features using Python SciPy.
- Built Factor Analysis and Cluster Analysis models using Python SciPy to classify customers into different target groups.
- Using R and Python with ggplot2 packages performed an extensive graphical visualization of overall data, including customized graphical representation of revenue reports, specific item sales statistics and visualization.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using python Library Pandas.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Involved in loading data from RDBMS and web logs into SQL and NOSql
- Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Developed Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Worked on improving performance of existing Pig and Hive Queries.
- Created reports and dashboards, by using D3.js and Tableau 9.x, to explain and communicate data insights, significant features, models scores and performance of new recommendation system to both technical and business teams.
- Utilize SQL, Excel and several Marketing/Web Analytics tools (Google Analytics, AdWords) in order to complete business & marketing analysis and assessment.
- Used Git 2.x for version control with Data Engineer team and Data Scientists colleagues.
- Used Agile methodology and SCRUM process for project developing.
Confidential - Long Island, NY
- Perform Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time etc.
- Extracted the data from hive tables by writing efficient Hive queries.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Analyze Data and Performed Data Preparation by applying historical model on the data set in Tablue.
- Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.
- Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of customers through a discovery approach.
- Develop Spark/Scala, Python, R for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana etc.
- Work with NLTK library to NLP data processing and finding the patterns.
- Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
- Analyze traffic patterns by calculating autocorrelation with different time lags.
- Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
- Addressed over fitting by implementing of the algorithm regularization methods like L2 and L1.
- Use Principal Component Analysis in feature engineering to analyze high dimensional data.
- Create and design reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
- Perform Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
- Implemented different models like Logistic Regression, Random Forest and Gradient-Boost Trees to predict whether a given die will pass or fail the test.
- Perform data analysis by using Hive to retrieve the data from Sql to retrieve datafrom Oracle database and used ETL for data transformation.
- Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
- Develop MapReduce pipeline for feature extraction using Hive and Pig.
- Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Create various types of data visualizations using Python and Tableau.
- Communicate the results with operations team for taking best decisions.
- Collect data needs and requirements by Interacting with the other departments.
Environment: Python 2.x, R, HDFS, Hadoop 2.3, Hive, Linux, Spark, IBM SPSS, Tableau Desktop, SQL Server 2012, Microsoft Excel, Matlab, Spark SQL, Pyspark.
Confidential - Richmond, VA
- Worked in comprehending and examining the client's business requirements.
- Used Django frameworks and Python to build dynamic webpages.
- Developed tools for monitoring and notification using Python.
- Enhanced the application by using HTML and Java script for design and development.
- Used data structures like directories, tuples, Object Oriented class based inheritance features for making complex algorithms of networks.
- Created PHP/MySQL back-end for data entry from Flash and worked in tandem with the Flash developer to obtain the correct data through query string
- Involved in designing database Model, API's, Views using python to build an interactive web based solution.
- Generated Python Django Forms to record data of online users.
- Implemented Data tables to add, delete, update and display patient records and policy information using PyQt.
- Implemented a module to connect and view the status of an Apache Cassandra instance using python.
- Developed MVC prototype replacement of current product with Django.
- Improved the Data Security and generated report efficiently by caching and reusing data.
- Managed datasets using Panda data frames and MYSQL. Queried the database queries using Python-MySQL connector and retrieved information using MySQLdb.
- Recorded the online users' data using Python Django forms and implemented test case using Pytest.
- Developed the application using the TestDriven methodology and designed the unit tests using Python Unit test framework.
- Created web application prototype using jQuery and Angular JS.
- Deployed the project into Heroku using GIT version control system,
- Maintained and Updated the application in accordance to the clientele's requirement
- Used standard Python modules e.g. CSV, Robot parser, Iter tools, Pickle, Jinja2, Xml for development.
- Query and set up employee registration and login using Python PostgreSQL
- We use a multiple row data storage strategy called MVCC to make effective PostgreSQL responsive in Querying and storing in database.
- Experience for handling documented database and video file for online management system during creating online learning management solutions.
- Automated RabbitMQ cluster installations and configuration using Python/Bash.
- Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
- Fetched twitter feeds for certain important keyword using Python-twitter library.
- Used Python library Beautiful Soup for web scrapping to extract data for building graphs.
- Performed troubleshooting, fixed and deployed many Python bug fixes for Learning Management System
- Used Python Flask framework to build modular & maintainable applications.
- Automated data movements using Python scripts.
- Involved in splitting, validating and processing of files.
- Created core Python API which will be using among multiple modules.
- Uploaded statistics to MySQL for analysis and logging.
- Developed complex SQL queries for testing the database functionality.
- Used UNIX server for application deployment and configuration.
- Written shell scripting for automation.
- Provided technical assistance for maintenance, integration and testing of software solutions during development and release processes.
- Created unit test/regression test framework for working/new code.
Environment: Python, Django, Linux, HTML, CSS, Shell Scripting, PostgreSQL, MySQL, Python-Twitter Library, Flask, Web services, SVN, Pandas, FileZilla etc.