Data Scientist Resume Chicago, IL - Hire IT People

SUMMARY

Around 8+ years of experience in IT and 5+years' experience in Data scientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
Hands on experience on Spark-Mlib utilities such as classification, regression, clustering, collaborativefiltering, dimensionalityreductions.
Extensive experience in Text Analytics, developing different Statistical Machine Learning, DataMining solutions to various business problems and generating data visualizations using R, Python and Tableau
Strong knowledge of statistical methods (regression, timeseries, hypothesistesting, randomizedexperiment), machineleaning, algorithms, datastructures and datainfrastructure.
Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volumes of structures and unstructured data.
Extensive hands on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, HadoopMapReduce
Expertise in the implementation of Coreconcepts of Java, JEETechnologies, JSP, Servlets, JSTL, EJB, JMS, Struts, Spring, Hibernate, JDBC, XML, WebServices, and JNDI.
Extensive experience working in a Test-DrivenDevelopment and Agile-ScrumDevelopment.
Experience in working on both Windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
Flexible with Unix/Linux and WindowsEnvironments, working with OperatingSystems like Centos5/6, Ubuntu13/14, Cosmos.
Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing.
Experience in Datamigration from existing data stores to Hadoop.
Developed MapReduce programs to perform DataTransformation and analysis.
Experience in analyzing data with Hive and Pig using on reading data schema.
Created Development Environments in AmazonWebServices using services like VPC, ELB, EC2, ECS and RDS instances
Strong experience in SoftwareDevelopmentLifeCycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies
Proficient in DataScienceprogramming using Programing in R, Python and SQL
Proficient in SQL, Database, DataModeling, DataWarehousing, ETL and reporting tools
Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and its integration with Hadoopcluster..
Proficient in using AJAX for implementing dynamic Web Pages.
Solid team player, teambuilder, and an excellentcommunicator.

TECHNICAL SKILLS

Languages: Java 8, Python, R

Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, seaborn, sciPy, matplot lib, scikit-learn, Beautiful Soup, Rpy2, sqlalchemy.

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau,Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools: Informatica Power Centre, SSIS.

Version Control Tools: SVM, GitHub.

Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Data Scientist

Responsibilities:

Identifying the Customer and account attributes required for MDM implementation from disparate sources and preparing detailed documentation.
Performing data profiling and analysis on different source systems that are required for CustomerMaster.
Worked closely with the DataGovernanceOfficeteam in assessing the source systems for project deliverables.
Used Confidential -SQL queries to pull the data from disparate systems and Data warehouse in different environments.
Used DataQualityvalidation techniques to validate CriticalData Elements (CDE) and identified various anomalies.
Extensively used open source tools - RStudio (R) and Spyder (Python) for statistical analysis and building the machinelearning.
Involved in defining the Source To businessrules, Targetdatamappings, datadefinitions.
Presented DQ analysis reports and score cards on all the validated data elements and presented -to the business teams and stakeholders.
Performing DataValidation / DataReconciliation between disparate source and target systems (Salesforce, Cisco-UIC, Cognos, DataWarehouse) for various projects.
Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
Writing complexSQL queries for validating the data against different kinds of reports generated by Cognos.
Extracting data from different databases as per the business requirements using SqlServerManagementStudio.
Interacting with the ETL, BIteams to understand / support on various ongoing projects.
Extensively using MSExcel for datavalidation.
Generating weekly, monthly reports for various business users according to the business requirements.
Manipulating/mining data from database tables (Redshift, Oracle, DataWarehouse)
Providing analytical network support to improve quality and standard work results.
Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
Interface with other technology teams to load (ETL), extract and transform data from a wide variety of data sources
Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, MapReduce and others
Provides input and recommendations on technical issues to Business&DataAnalysts, BIEngineers and DataScientists.

Environment: Data Governance, SQL Server, ETL, MS Office Suite - Excel (Pivot, VLOOKUP), DB2, R, Python, Visio, HP ALM, Agile, Sypder, Word, Azure, MDM, SharePoint, Data Quality, Tableau and Reference Data Management.

Confidential, Chicago, IL

Data Scientist

Responsibilities:

Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
Worked onanalyzing data from GoogleAnalytics, AdWords, Facebook etc.
Evaluated models using CrossValidation, Loglossfunction, ROCcurves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana.
Performed DataProfiling to learn about behavior with various features such as trafficpattern, location, Date and Time etc.
Categorized comments into positive and negative clusters from different social networking sites using SentimentAnalysis and TextAnalytics
Used Pythonscripts to update content in the database and manipulate files
Skilled in using dplyr and pandas in R and Python for performing exploratory data analysis.
Performed Multinomial LogisticRegression, DecisionTree, Randomforest, SVM to classify package is going to deliver on time for the new route.
Performed data analysis by using Hive to retrieve the data from Hadoopcluster, Sql to retrieve datafrom Oracledatabase and used ETL for data transformation.
Performed DataCleaning, features scaling, features engineering using pandas and numpy packages in python.
Exploring DAG's, their dependencies and logs using AirFlow pipelines for automation
Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
Developed Spark/Scala, RPython for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
Used clustering technique K-Means to identify outliers and to classify unlabeled data.
Tracking operations using sensors until certain criteria is met using AirFlowtechnology.
Responsible for different Datamapping activities from Source systems to Teradata using utilities like TPump, FEXP, BTEQ, MLOAD, FLOAD etc
Analyze traffic patterns by calculating autocorrelation with different time lags.
Ensured that the model has low FalsePositiveRate and Textclassification and sentiment analysis for unstructured and semi-structured data.
Addressed over fitting by implementing of the algorithm regularization methods like L1 and L2.
Used Principal Component Analysis in feature engineering to analyze high dimensional data.
Used MLlib, Spark'sMachinelearning library to build and evaluate different models.
Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
Developed MapReduce pipeline for feature extraction using Hive and Pig.
Communicated the results with operations team for taking best decisions.
Collected data needs and requirements by Interacting with the other departments.

Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, AWS, Linux, Spark, Tableau Desktop, SQL Server 2014, Microsoft Excel, MATLAB, Spark SQL, Pyspark.

Confidential, Washington, District of Columbia

Data Analyst

Responsibilities:

Worked with BI team in gathering the report requirements and also Sqoop to export data into HDFS and Hive.
Involved in the below phases of Analytics using R, Python and Jupyter notebook.
Data collection and treatment: Analysed existing internal data and external data, worked on entry errors, classification errors and defined criteria for missing values
Data Mining: Used cluster analysis for identifying customersegments
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Assisted with data capacity planning and node forecasting.
Installed, Configured and managed Flume Infrastructure
Administrator for Pig, Hive and HBase installing updates patches and upgrades.
Worked closely with the claims processing team to obtain patterns in filing of fraudulent claims.
Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
Patterns were observed in fraudulent claims using text mining in R and Hive.
Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Adept in statisticalprogramminglanguages like Rand Python including BigData technologies like Hadoop, and Hive.
Experience working as DataEngineer, BigDataSparkDeveloper, FrontEndDeveloper and ResearchAssistant.
Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
Using HiveQL developed many queries and extracted the required information.
Created Hivequeries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HadoopDistributedFile System and PIG to pre-process the data.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.

Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, R, VMware, Eclipse, Cloudera, Python.

Confidential, SFO, CA

Python Developer

Responsibilities:

Developed a portal to manage and entities in a content management system using Flask
Designed the database schema for the content management system.
Designed email marketing campaigns and also created responsive web forms that saved data into a database usingPython/ Django Framework.
Worked on Hadoopsinglenode, Apachespark, Hiveinstallations
Developed views and templates in Django to create a user-friendly website interface.
Configured Django to manage URLs and application parameters.
Supported MapReduce Programs those are running on the cluster
Worked on CSV files while trying to get input from the MySQL database.
Wrote programs for performance calculations using Numpyandsqlalchemy.
Administered and monitored multi DatacenterCassandracluster based on the understanding of the CassandraArchitecture.
Extensively worked with Informatica in designing/developing ETL process to load data from xml sources to target database
Designed, automated the process of installation and configuration of secure DataStaxEnterpriseCassandra using chef
Wrote Python scripts to parse XML documents and load the data in database.
Worked in stages such as analysis and design, development, testing and debugging.
Built Web pages that are more user-interactive using jQueryplugins for Drag and Drop, AutoComplete, JSON, AngularJS, JavaScript.

Environment: Python 2.7, Windows, MySQL, ETL, Ansibleflask and Python Libraries such as Numpy, sqlalchemy, Angular Js, MySQL DB.

Confidential

SAS Programmer

Responsibilities:

Analyzed high volume, high dimensional client and survey data from different sources using SAS and R.
Manipulated large financial datasets, primarily in SQL and R
Used R for large matrix computation
Developed Algorithms (DataMiningQuery's) to extract data from data warehouse & databases to build Rules for the Analyst & Models Team.
Used R to import high volume of data
High level programming efficiency in the use of statistical modeling tools such as SAS, SPSS and R.
Developed predictive models using R to predict customers churn and classification of customers
Worked on Shiny and R application displaying machine learning for improving the forecast of business.
Developed, reviewed, tested & documented SAS programs/macros.
Created Templates by using SAS macro for existing reports to reduce the manual intervention.
Created Self-service tools for Onshore/Offshore team for data retrieval.
Worked on daily reports and used them for further analysis.
Developed/Designed templates for new data extraction requests.
Executed weekly reports for CommercialDataAnalyticsTeam.
Communicated progress to key Business partners and Analysts through status reports and tracked issues until resolution.
Created predictive and other analytically derived models for assessing sales.
Provided support in the design and implementation of ad hoc requests for Sales-RelatedPortfolioData.
Responsible for preparing test case documents and Technical specification documents.

Confidential

SAS Developer/Analyst

Responsibilities:

Integrates all transaction data from multiple data sources used by Actuarial into a single repository.
Implemented and executes monthly incremental updates to the data environment.
Interacts with IT and finance and executes data validation tie-out reports.
Developed new programs and modified existing programs passing SAS macro variables to improve ease and efficiency as well as consistency of results.
Created Data transformation and DataLoading (ETL) scripts for DataWarehouses.
Implement fully automated data flow into Actuarial front end (Excel) Models using SAS process.
Creating SAS programs using SASDI Studio.
Validated the entire data process using SAS and BI tools.
Extensively used PROCSQL for column modifications, field populations on warehouse tables.
Developed distinct OLAP Cubes from SASDataset and generated results into the excel sheets.
Involved in discussions with business users to define metadata for tables to perform ETL process.

Environment: Python 2.7, Windows, MySQL, ETL, Ansibleflask and Python Libraries such as Numpy, sqlalchemy, Angular Js, MySQL DB.

We provide IT Staff Augmentation Services!

Data Scientist Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship