Data Scientist Resume
Dearborn, MI
SUMMARY
- Around 6 years’ Experience in Big Data, Data Science, Data analysis and Information Technology.
- Experience in developing predictive models using R and Python(Supervised(Scikit - learn), Unsupervised, Deep learning (Keras) and Pyspark
- Good knowledge on Spark In-memory capabilities and its modules: Spark Core, Spark SQL, Spark Streaming, MLlib
- Experience in NLP (nltk, gensim, polygot and spaCy) and Network Analysis(networkx).
- Experience in developing Spark jobs using PySpark API
- Working knowledge of streaming applications and scheduling workflows
- Extensive Hands on Developing and maintaining Big Data streaming applications using Kafka and Spark Streaming
- Knowledge in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice versa.
- Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing.
- Deep knowledge on various Relational Databases (SQL Server, MYSQL and Oracle).
- Well experienced in Normalization&De-Normalizationtechniques for optimum performance in relational and dimensional database environments.
- Hands on experience in working with Ecosystems like Hive, Pig, Sqoop, Spark, Oozie and Kafka.
- Experience in working with NoSQL databases such as HBase and Cassandra.
- Expertise in data cleaning, data analysis and Data Visualization using Python (Pandas, Numpy, Matplotlib and Seaborn Packages).
- Hands on experience in connecting Databases using Python (sqlalchemy package).
- Proficient in StatisticalModeling and MachineLearningtechniques (Linear, Logistics, DecisionTrees, RandomForest, SVM, K-NearestNeighbors)
- Hands on experience in Data Modelling: Dimensional and Relational.
- Hands on experience and knowledge in developing Java scripting, VB scripting and Shell scripting.
- Experience in designing stunning visualizations using QLIKVIEW software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Deep knowledge in creating custom objects using JavaScript in QLIKVIEW.
- Deep Knowledge in Alteryx Tools and creating Macros in Alteryx.
- Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, Hive and Pig
- Excellent team player with very good written and verbal communication skills.
TECHNICAL SKILLS
Analysis: R, Python, Tableau, Alteryx, Qlikview and MS- Excel
Databases: SQL Server, MYSQL, Oracle, Hbase and Cassandra
Languages: Python, Java, Scala, Pig Latin and Hive
Big Data Technologies: Hadoop, Spark (core, Spark, SQL, MLLIB and Pyspark), HDFS, Map Reduce, Hive, Pig, HBase, Sqoop, Oozie, Kafka and Flume
Operating system: Windows 8, Windows 7, UNIX, Linux and CentOS
Scripting Languages: UNIX Shell scripting, Java Script and VB script
Domain Knowledge: Streaming analytics, Big Data, Data science, IOT and Visualization
Tools: Git, MYSQL Workbench, Toad for MY SQL, SPSS Statistics, Minitab, Qlikview, Alteryx, Eclipse, IntelliJ IDE, Cloudera CDH5 and Horton Works.
PROFESSIONAL EXPERIENCE
Confidential, Dearborn, MI
Data Scientist
Responsibilities:
- Built a classification prediction model in R / Python (Scikit-learn) to classify the Customs Data.
- Clustering Stocks using KMeans.
- Solved Classification and Regression problem using Keras.
- Performed Text preprocessing and Topic Identification using Python packages (NLTK and Genism). Built a classifier model with TfidfVectorizer (NLP)
- Conducted a Training session on how to analyze and process the data using Pig Latin and HiveQL (Hadoop Ecosystem).
- Python pandas package method (Pivot table, Melt and Merge) is used for data wrangling/ Data cleansing.
- Developed a pig script to blend data and show it in a QlikView Dashboard.
- Migrated Relational database table to Hadoop (Hive table) Using SQOOP
- Schedule/Automated Sqoop job in Oozie.
- Performed Hive Query to transform the data.
- Built a Data modelling (Dimensional and Relational modelling).
- Developed a Macro in Alteryx using R (HTTR package). Macro will upload a file from Local Disk or Network Drive to SharePoint Libraries. Macro eliminates latency problem in Alteryx (Windows Authentication) to SharePoint.
- Developed Alteryx security tool (xml parse and R). Tool provides summary report about the tools used in workflows (lists, data/table/query and connection) and act as a firewall (check memory utilization, Load to the server, Personal data and access issue) to stop workflow before publishing to server.
- Built a custom object in Qlikview using JavaScript to get user input from the Dashboard and send it to Alteryx server to process the data and store it in a Database.
- Automated the Alteryx software license File installation by developing a VBScript and T-SQL.
- Conducted a Training session on how to build a custom object in QlikView.
- Proficient in R, JavaScript, VBScript, HDFS, Hive QL, Pig Latin, Sqoop and SQL.
Confidential, Bay Area, CA
Solutions Consultant
Responsibilities:
- Worked on Flight prediction model, joining different datasets (flight data and weather data) using Spark SQL and Hive QL, processed a weather data in Scala.
- Built a whole machine learning pipeline to predict if flights will be late (Pyspark)
- Built a predictive model using Random forest algorithm in R to find Driver alertness in probability based on various factors Human, Vehicle and Environmental factors. Model is used with the streaming data and published in Dashboard to visually see how each driver is performing at timestamp and alert them through Notifications.
- Parsed a Json dataset and converted to csv file format using spark SQL and R
- Performed network word count using Spark streaming
- Bringing streaming data using Apache Kafka and Spark Streaming
- Spark streaming context is used on top of Spark Core Engine to read a line from the SocketTextStream. Split the lines into multiple words based on space. Mapped each word with 1, reduce them based on key and count each occurrence of word.
- Performed a server log analysis using Spark and wrote the result to MySQL Database.
- Build an IOT Model in Vitria software that will fetch data from SFTP server and push the data to HDFS target. Apache spark service will fetch the data from HDFS, will process the data using predictive model (Random forest and linear regression) based on key factor indicators and published the result to the dashboard. Dashboard will display the result based on population level, device level and group level.
- Wrote a Scala code that fetches a current and Future weather data from Rest URL. Format of data from the REST URL is Json. Parsed the desired Json field and stored the field in the database.
- Written a JavaScript function that parse Xml data from a REST URL.
- Written a JavaScript function to show the sample sales data in bipartite widget.
- Created a form using HTML 5 and validate the form using JavaScript function.
- Created SVG diagram like paths, point of interest and show details when Hover on Point of Interest in Geo-map overlay using Dojo JavaScript.
- Worked in MYSQL and Oracle database for pulling the data.
Confidential
Academic Internship/ Data Analyst
Responsibilities:
- Worked with project team representatives to ensure that logical and physical ER data models were developed in line with corporate standards and guidelines.
- Involved in defining the source to target data mappings, business rules, data definitions.
- Worked with Toad to submit SQL statements, import and export data, and generate reports in SQL Server.
- Involved in defining the business/transformation rules applied for service data.
- Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
- Implementation of Stored Procedures, triggers and execution of test plans
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Enterprise Metadata Library with any changes or updates.
Confidential
Academic Internship / Analyst
Responsibilities:
- Maintained spreadsheet and database regarding student details/business Management data.
- Wrote SQL query to generate report every week about individual student performance.
- Created logical/ physical data models in SQL server using SQL server management studio
- Created stored procedure to automate the process.
- Worked in Spreadsheet to perform simple operation what if analysis, VLOOKUP and Pivot table.
- Facilitated transition of logical data models into the physical database design and recommended technical approaches for good data management practices.
- Created dashboard in spreadsheet to show individual student performance in every department.
- Organized team presentation at the end of each week to review the group activities and to clarify their concerns.
Confidential
Technical Trainee
Responsibilities:
- Worked in Microcontroller and Microprocessor. (8085, 8086 and PIC 16F887)
- Worked in MATLAB to process signal data.
- Learned to Design and Simulate Digital communication systems.