We provide IT Staff Augmentation Services!

Data Analyst Resume

0/5 (Submit Your Rating)

Nashville, TN

SUMMARY

  • Over 6 years of hands on experience & comprehensive industry knowledge of Machine Learning Statistic Modeling, predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models.
  • Experienced in utilizing analytical applications like R, and Python to identify trends and relationships between different pieces of data draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
  • Expertise in Model Development, Data Mining, Predictive Modeling, Data Visualization, Data Clearing and Management, and Database Management.
  • Proficient in Hadoop, Hive, MapReduce, Pig and NOSQL databases like MongoDB, HBase, Cassandra.
  • Excellent experience in SQL Loader, SQL Data, SQL Data Modeling, Reporting, SQL Database Development to load data from the Legacy systems into Oracle Databases using control files and used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research, Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco - system.
  • Experienced in Data Modeling retaining concepts of RDBMS, Logical and Physical Data Modeling until 3NormalForm (3NF) and Multidimensional Data Modeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions).
  • Expertise in Excel Macros, Pivot Tables, VLOOKUP’s and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
  • Experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Ab Initio and Informatica Power Center.
  • Used Python Unit test framework for developing and implementing the unit tests using Test driven approach.
  • Analyzed instrument pricing and modelling methodologies and documented how instrument prices move as change in market data source.
  • Good experience on data warehousing for high-volume data processing
  • Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, and other advanced statistical techniques.
  • Extensive knowledge and experience in producing tables, reports, graphs and listings using various procedures and handling large databases to perform complex data manipulations.
  • Solid ability to meet tight deadlines while maintaining accuracy. Excellent communications and industry organizational skills.
  • Efficient understanding of business operations and analytics tools for effective analyses of data.
  • Excellent communication and interpersonal skills, good team player and Quick Learner with can do attitude.
  • Strong team player with efficient communication skills

TECHNICAL SKILLS

Programming Language: R,Python, Scala, SQL

Scripting Languages: Python (NumPy, SciPy, Pandas, Keras), R (Caret, Weka, ggplot)

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, HDFS, Sqoop, Oozie, Spark and Scala.

Reporting Tools: Shiny, Tableau.

Operating Systems: Windows, UNIX, MS DOS, Sun Solaris

PROFESSIONAL EXPERIENCE

Confidential, Nashville TN

Data Analyst

RESPONSIBILITIES:

  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, Audit programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS and implemented a Python-based distributed random forest via Python streaming.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to theData Lake and involved in configuring batch job to perform ingestion of the source files in to theData Lake.
  • Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into S3.
  • Worked on monitoring and troubleshooting the Kafka-Storm-HDFS data pipeline for real-time data ingestion inData lakein HDFS.
  • Conducted studies, rapid plots and using advance data mining and statistical modeling techniques to build solution that optimize the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in converting Hive /SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats by applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.

ENVIRONMENT: Python, SQL, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential, CHICAGO IL

Data Analyst

RESPONSIBILITIES:

  • Involved in the entiredatascience project life cycle and actively involved in all the phases includingdataextraction,datacleaning, statistical modeling anddatavisualization with largedatasets of structured and unstructureddata
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
  • Handled importingdatafrom variousdatasources, performed transformations using Hive, Map Reduce, and loadeddatainto HDFS.
  • Applied breadth of knowledge in programming (Python, R), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop)
  • Worked with machine learning algorithm such as logistic regression, random forest, Boost, KNN, SVM, neural network, linear regression, lasso regression and k-means
  • Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
  • Worked on real-time data processing using Spark/Storm and Kafka using Scala and worked on writing Scala programs using Spark on Yarn for analyzing data and worked on writing Scala programs using Spark/Spark-SQL in performing aggregations and developed web services in play framework using Scala in building stream data platform.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization and created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc to import data into the data warehouse.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in converting Hive /SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques

ENVIRONMENT: Python, SQL, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential

Data Analyst

Responsibilities:

  • Explored the insurance claim data and found the patterns, groups, and regions where claims are more and compared them using pie charts and bar graphs.
  • Involved in designing and developing the predictive model which predicts the false claims using the historical claims data with 85%, performed this in R programming and did A/B testing.
  • Designed a model to predict the potential claimant (who claims more than a specific amount) using company’s claims data using Logistic Regression, Decision Trees, and Random Forest.
  • Performed Credit Risk Predictive Modelling by using Decision Trees and Regressions to get the risk involved by giving individual scores to the customers.
  • Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
  • Used Django Database API's to access database objects.
  • Wrote python scripts to parse XML documents and load the data in database.
  • Worked extensively with Bootstrap, JavaScript, and jQuery to optimize the user experience.
  • Used Python and Django to interface with the jQuery UI and manage the storage and deletion of content.
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.

ENVIRONMENT: SQL, logistic regression, Hadoop, Hive, random forest, SVM, JSON, Tableau, XML AWS.

Confidential

Software Engineer

Responsibilities:

  • Involved in the analysis, design and development and testing phases of Software Development Life Cycle (SDLC).
  • Used Agile methodology for Software Development.
  • Involved in design phase and data modeling, interacted with other team members to understand the requirements for the project.
  • Working with Web admin and the admin team to configure the application on development, training, test and stress environments (Web logic server).
  • Design, develop, test, deploy and maintain the website.
  • Designed and developed the UI of the website using HTML, AJAX, CSS and JavaScript.
  • Designed and developed data management system using MySQL.
  • Rewrite existing Python/Django modules to deliver certain format of data.
  • Used Django Database API's to access database objects.
  • Wrote python scripts to parse XML documents and load the data in database.
  • Handled all the client-side validation using JavaScript.
  • Worked extensively with Bootstrap, JavaScript, and jQuery to optimize the user experience.
  • Used Python and Django to interface with the jQuery UI and manage the storage and deletion of content.
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
  • Responsible for debugging the project monitored on JIRA (Agile)
  • Used jQuery for all client-side JavaScript manipulation.
  • Created unit test/regression test framework for working/new code
  • Using Subversion control tool to coordinate team-development.
  • Built development environment with JIRA, Stash/Git.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Responsible for debugging and troubleshooting the web application

Environment: Python 2.6, Scipy, Pandas, Bugzilla, SVN, C++, jQuery, MS SQL, Visual Basic, Linux, Eclipse, Java Script, XML, JASPER, PL/SQL, Oracle 9i, Shell Scripting, HTML5/CSS, Apache, Java, Struts, Spring, Swing/JFC, JSP, XML, Web Logic, Eclipse IDE, SOAP, Maven, Web Logic Server, WSDL, JAX-WS, Apache Axis.

We'd love your feedback!