We provide IT Staff Augmentation Services!

Data Scientist, Big Data Engineer Resume

New York City, Ny


  • Over 10+ Years of Experience in IT industry and around 4 years of experience as a Data Scientist and Big Data Technologies.
  • 4 years of experience in Hadoop and its ecosystems such as HDFS, MapReduce, IMPALA, Yarn, Hive, Pig, Hbase, Sqoop, Flume, Spark and Oozie.
  • Working knowledge in Spark core and Spark SQL .
  • Expertise in Data load management, importing & exporting data using SQOOP .
  • Excellent Programming skills at a higher level of abstraction using Scala and Spark .
  • Experience in writing, testing and debugging MAPRUDUCE programs and using Apache Hadoop API for analyzing the data. Hands on NOSQL database experience with HBASE.
  • Used various machine learning algorithms for forecasting - linear and logistic regression, K-nearest neighbors (KNN), decision trees, random forests, boosting and bagging, matrix determinant analysis and model blending.
  • Built predictive analytics models to generate actionable insights.
  • Experience in getting the business requirements from the client and preparing Functional and Technical Document.
  • Experience in Leading offshore & On-site team and getting the deliverables submitted on time.
  • Implemented Automation in the Confidential project using Machine Learning algorithm, tensor flow for project Jarvis.
  • Expert in Agile methodologies and worked as a Scrum master to boost the productivity of the team.
  • Headed Confidential Genius team to implement an idea each quarter to improve the project and the process.
  • Possess strong interpersonal, communication and client-facing skills and ability to work closely with users, cross-functional teams, testing teams, external vendors, onsite and offshore teams.
  • Implementing, Customization of Oracle E Business Suit(EBS). Strong in SCM (INV, PO, OM, BOM, WIP) and Financials (AP, AR, GL), Procure to Pay (P2P), Order to Cash (O2C), internal sales order, drop shipment, back to back cycle. experience on various RICE Components (Reports, Interfaces, Conversions and Extensions) with CEMLI standards.


HADOOP FRAMEWORK: HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Flume, Spark, Oozie, HBASE, IMPALA, Scala, Cloudera Distribution Hadoop


TOOLS: Confidential CLOUD ENGINE, Confidential CLOUD PUB/SUB, Confidential Big Query,R-Studio, Jupyter Notebook,Jupiter Notebooks, Tableau, SAP BO, MS EXCEL, Oracle DISCOVERER, ODI, OBIEE, XML Publisher, TOAD, SQL* LOADER, RDF REPORTS, WEB ADI, REVPRO, ORACLE APEX, Oracle TEXT, ODI, OAF, ADF, JDEVELOPER.

OS & DATABASE: MAC OS X 10. 8+, WINDOWS 7,8,10, WIN 2008 SERVER, UBUNTU SERVER 12.10, MY SQL 5.X+, ORACLE 11G, ORACLE 9I, 10G, 11I, R12

LANGUAGES: SQL, PL/SQL, HTML, CSS, Java, JavaScript, XML, Python (NUMPY, PANDAS, MATPLOTLIB, SCIKIT LEARN (Machine Learning Libraries)


Confidential, New York City, NY.

Data Scientist, Big Data engineer


  • Access the validity and rigor of new data sources and approaches, and the capabilities of potential solutions.
  • Oversee data-focused initiatives, and drive developments in advancing big-data and analytic services.
  • Establish measurement approaches to assess relative performance of customer retention strategies within our sales and marketing channels.
  • Work closely with Data Strategy, International and executive team to build a result focused Data analytics.
  • Architecting the Data Lake and move from on premise Data center to Amazon AWS.
  • Experience in using Sqoop to import the data on to Cassandra tables from different relational databases.
  • Automated multiple self service jobs using TWS.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Worked on the core and Spark SQL modules of Spark extensively using programming languages like Python and Scala.
  • Utilizing Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Python/PySpark also Scala and databases such as HBase
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs utilizing Spark, Hive and Pig.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Responsible in exporting analyzed data to relational databases using Sqoop.
  • Involved in converting Hive/SQL queries into Spark transformations using Sark RDDs, Scala.
  • Hadoop using Spark Context, Spark-SQL, Data Frame , Pair RDD's & Spark YARN.
  • Created Hive tables to store the processed results in a tabular format.
  • Implemented Daily Oozie coordination jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Creating the tables in Hive and integrating data between Hive & Spark.
  • Responsible for tuning Hive and Pig scripts to improve performance.
  • Created reports using SAP BO client by extracting data from IMPALA to Universe.
  • Extensively worked with Cloudera Distribution Hadoop, CDH 5.x.
  • Done POC on Amazon AWS, S3, EMR, Redshift, Snowflake.
  • Migrated huge chunks of data into AWS S3 and created pipelines for cloud migration.
  • Responsible for transition from a Hadoop to Hybrid Architecture of AWS and snowflake.

Confidential, Sunnyvale CA.

Data Scientist, Big Data Developer


  • Involved in Gathering the client requirements. Preparing BRD(MD50), coordinating both offshore and onshore team members to make them understand the client requirements preparing the SDLC and getting the financial approvals.
  • Design and development of Big Data (Hadoop) and/or cluster-based (Spark) implementation of algorithms, models and processes based on techniques to improve operational efficiency in data classification with python and relevant libraries including SciKit learn and other in parallel processing and programming.
  • Worked on Big Query
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark -SQL, Data Frame, pair RDD’s, Spark YARN.
  • Used data visualization tools like Tableau and BO to visualize the data to the business.
  • Created the external tables in Hive to store property information.
  • Importing and exporting data into GFS(HDFS) from database and vice versa using Sqoop.
  • Configured the Sqoop jobs to load the property information data from GFS into Hive Tables.
  • Extensively worked on partitioning Hive Tables and running the hive scripts in parallel to reduce run time of the scripts.
  • Extensively Used Sqoop to import/export home value information data from GFS and Hive tables, incremental imports and created Sqoop jobs for last saved value.
  • Track the home value information in web applications used Hive drivers to interact with hive tables.
  • Collected logs data from webservers and integrated into HDFS using Flume.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Sark RDDs, Scala.
  • Hadoop using Spark Context, Spark-SQL, Data Frame , Pair RDD's & Spark YARN.
  • Created Hive tables to store the processed results in a tabular format.
  • Used Cider extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Good experience in importing other enterprise data from different data sources into HDFS using Sqoop and Flume , and also performing transformations using Hive and PIG and then loading into HBase tables.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Involved in reviewing the code, requirement for the other team members and making sure that delivery quality is up to the standard.
  • Involved in Code migration, testing and making sure that enhancements are deployed in production.
  • Utilized various Python libraries like Numpy and Pandas for Data analysis.
  • Supported & guided multiple tracks like Financials (AP, AR, GL, FA, CM) and SCM (INV, OM, PO) Manufacturing MFG (BOM, WIP, ASCP, WMS) in Oracle EBS.
  • Worked on value sets, Lookups, Executable and Concurrent Programs. Developed Interface for AR Lock box for AR module.
  • Leading activities like DIRT (Disaster Recovery Test) and on call service.
  • Creating packages, procedures, functions, trigger and other database objects using SQL and PL/SQL.
  • Seamlessly adopted many Confidential specific tools like Critique for version controlling and BUGANIZER for managing the change requests and Guts for ticketing purposes.


Techno Functional Consultant.


  • Involved in Preparing Technical Document like MD70.
  • Developed Various Interfaces to transfer data from legacy system to Oracle tables using TOAD and checked data validation by creating PL/SQL packages.
  • Creating packages, procedures, functions, trigger and other database objects using SQL and PL/SQL. Worked on value sets, Executable and Concurrent Programs.
  • Writing PL/SQL Programs for implementing validation rules and loading data from staging tables to Interface Tables in Oracle EBS.
  • Involved in Developing of new Reports as per Client requirement using XML Publisher/Reports 6i, 10g.
  • Developed Discoverer Reports in Oracle Payables. Worked on creating custom workflows.
  • Involved in Code migration, testing and making sure that enhancements are deployed in production.
  • Worked on Web ADI, customer outbound interface.
  • Developed Purchase order Interface to upload the STANDARD Purchase order into oracle base tables through interface tables in Oracle EBS.
  • Created interface to load the data from Oracle to ARIBA for P2P.
  • Proficient in writing Shell Scripts and registering them as Host Programs.
  • Developed Blanket PO, Standard PO, Receipts and Invoice Conversion, interfaces.
  • Validation of staging data before insertion into Oracle base tables or interface Tables.
  • Developed Discoverer Reports, XML Publisher Reports and D2K Reports in Various Modules like PO, AP, AR, and GL, PA


Oracle Technical Consultant


  • Developed new reports using BI Publisher XML Reports without using RDF
  • Involved in bug fixing of various RICE components
  • Prepared documents like MD70, MD120, CV60 and Test Results
  • Involved in development of packages, procedures, functions and triggers
  • Involved in O2C order to cash and P2P procure to pay modules, Implemented Oracle AME in PO and AP.
  • Performance tuning (create indexes, hints, materialized views)
  • Involved in form personalization, forms enhancement.
  • Analyzed the functional documents and prepared the Technical Designs for Forms and Reports.
  • Developed new forms and reports as per business requirements of client.
  • Coordinating between Onshore & Offshore Teams for Oracle ERP. Involved in execution of month end and weekend batch Process.
  • Involved in creation of unit test cases and execution, review of code and Unit Testing documents of team.
  • Defined value sets, concurrent programs, registering reports and PL/SQL procedures and functions.
  • Automated Quote to Cash Process as part of R12 Implementation.
  • Worked on tools like Oracle forms builder, reports builder, TOAD, SQL Developer and SQL * Loader, PVCS, Kintana for Deployment.
  • Worked on Fixed Assets, OAF (Oracle Application Framework) work request.
  • Given KT to junior team members on P2P, O2C and AP, FA, CM, PA modules in Oracle EBS.
  • Developed XML reports without using RDF file on Fixed Assets.


Associate Consultant


  • Development and customizations of Reports and Data Conversions.
  • Involved in writing Control Files for SQL Loader.
  • Executing concurrent programs from PL/SQL.
  • Defining Executable Programs
  • Registered the Concurrent Programs
  • Involved in AOL for Oracle ERP.
  • Developed XML Publisher Reports.
  • Involved into Data Conversions in Various Modules like PO, OM and INV in Oracle EBS.
  • Involved to prepare the MD70, MD120.
  • Involved in the Unit Test Cases, Code Review, and Performance.
  • Involved in the value-added services for the projects.
  • Move the objects from one instance to another by using FNDLOAD in PUTTY Tool.

Hire Now