We provide IT Staff Augmentation Services!

Senior Hadoop And Spark Developer Resume

Chicago, IL


  • An accomplished Senior Hadoop and Spark Developer over 15 years’ experience on Hadoop Big data architecture, cluster maintenance, implementation. Using large structured, unstructured and semi structure data sets of medical and retail data up to one trillion records with TB size. Building researching, apply statistical methods using data science and building machine learning and natural language processing applications. Find solutions to Business based on requirement.
  • Data processing / Business logic apply in Big Data Hadoop, Apache Spark using Map Reduce, Hive, H Base, Scala. Resulted data will be used to applying Data science (R Studio) and Machine Learning. Using K mean, Linear Regression Multiple regression, Decision Tree and recommendation system algorithms. Using this data reports will be built on Tableau and SSRS.
  • Strong experience Hadoop cluster and architecture and worked on Metadata Disk Failure, Data Integrity, Cluster Balancing, Replication Pipelining and data blocks.
  • Hands on experience on Hadoop processing large sets of structured data using Sqoop, semi - structured and unstructured data using Flume . Able to assess business rules, collaborate with stakeholders and perform source-to- Confidential data mapping, design and review.
  • Strong Experience on Data modeling design and architecture. Build Conceptual, Logical and physical Data Models. Architecture the OLTP Database and OLAP ( MOLAP, ROLAP ) Data ware house / Marts.
  • Professional experience on BI/ ETL over 14 years of proven expertise in leading and managing the different phases of DW& BI projects like Scoping Study, Requirements Gathering, Analysis, Planning, Design, Development, Testing and Implementation of Data Warehousing, ETL & BI solutions for reputable companies and banks such GE Financials, Confidential retail and Financials. Participated in more than 11 DW & BI project implementations. Responsible for managing and delivering all projects in DW portfolio on-time, within budget, and ensure strategic and business requirements are met.
  • Specialize in enterprise data warehouse architecture, strategy study, data warehouse implementation, DW, ETL & BI architecting, technology selection and proof of concepts. Received awards for developing reusable assets in the information management and business intelligence space and guiding team to nurture and promote innovation. Specialize in providing architecture solution blue- prints that enable utilization of Corporate Assets for building out DW, BI &ETL solutions and managing multi-technology, multi- geography DW&BI implementation and Data governance programs.
  • Expertise in DB2, Hadoop, Oracle 9i, 10g, INFORMATICA Power Center, SQL Server 2014/2012/2008/2005 Integration Services (SSIS) and SQL Server 2014/2012/2008/2005 reporting services (SSRS)
  • Strong Architecture experience on SQL Server DB Design, ETL process, SSAS analysis, SSRS . SQL performance, Maintain isolation levels and security.
  • Dimensions, Physical and logical data modeling using ERwin and ER-Studio.
  • Experience on Java Programing language to write java script to perform tasks on Informatics.
  • Strong knowledge on Data Science to extract knowledge or insights from data in various forms, either structured or unstructured, use their data and analytical ability to find and interpret rich data sources; manage large amounts of data despite hardware, software, and bandwidth constraints, merge data sources, ensure consistency of datasets, create visualizations to aid in understanding data; build mathematical models using the data; and present and communicate the data insights/findings.
  • Professional experience on BI design and implementation using Informatics and SSIS. Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, OLAP, BI, Client/Server applications.


Confidential, Chicago, IL

Senior Hadoop and spark Developer

Environment: Hadoop 2.0, Hue, Apache Spark, map Reduce, Eclipse, Java, Scala, Hive, Sqoop, Flume, Oozie, HBase, Kafka, R Studio, Tableau, IBM DB2, SQL Server 2014.


  • Designed ETL pipeline using Hadoop Sqoop, implement, and validate solutions in map reduce, Apache Spark, Apache Hive, using Scala in a large state-of-the-art cluster. system reads big data files from various health care companies and pharmacy vendors.
  • Used Hive to handles large structured data, created hive partition, indexes to read data faster. Created complex HiveQL queries to process business logic.
  • Used tableau to show reports, Tableau connected to hive DW and shows creates in memory reports. Based on business reports created in tabular, graph, bar and Pie chart.
  • Hadoop Hue open source web interface used to load data into Hadoop, View, Prepare, Process, analyze and visualize. These tools are handy to work on Hive, Pig, Spark SQL.
  • Implemented Hive Dynamic Partitions, Custom Map/Reduce Scripts, Hive Indexes and views Hive query optimizers.
  • Experience on Kafka real time data pipeline and streaming which reads data from online portal log data to analysis daily visited pages. It is scalable, fault tolerance. Used Producer API, Consumer API, Stream API and Connector API.
  • Worked on Advance HBase Data Model DB, HBase Shell, HBase Client API, Data Loading Techniques, Zookeeper Data Model, Zookeeper Service, Zookeeper, Demos on Bulk Loading, Getting and Inserting Data, Filters in HBase.
  • Used Hadoop Oozie Workflow scheduler for Hadoop jobs. Workflow contains Sqoop etl process, Hive scripts, Pig task. This workflow starts when new file exists on share folder.
  • Hadoop Talend Integration. Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Worked on Hadoop Map Reduce, YARN MR Application Execution Flow, YARN Workflow, Anatomy of MapReduce Program, Demo on MapReduce. Input Splits, Relation between Input Splits and HDFS Blocks .
  • Using R studio apply Data science to find correct Model for business using linear regression, Logistic regression, SVMs logarithm and applied machine learning algorithms to find decision. Used Decision tree and KNN, K -Mean to for decision.
  • Use your machine learning expertise to research and recommend the best approaches to solving our technology and business problems.
  • To Find unusual quantity / Sales data used Anomaly Detection in data science Machine learning, used Density techniques like K-nearest neighbor, Local outlier factor, correlation based outlier detection for high-dimensional data.
  • Reports will be generated buy Company by product, buy geographically buy product buy company and buy product and buy vendor.
  • Used Apache Spark Scala to run the business logic algorithms to find required information. Spark executes algorithms very quick using in memory.

Confidential, Chicago IL

Senior Hadoop and Spark Developer

Environment: Hadoop, Map Reduce, Hive, R Studio, Eclipse, Java, Oracle 10g, Informatica Power Center 8.1, SQL Server 2014, Tableau, C# Scripting, Java, HTML, SSAS, Table Partition.


  • Worked on Flume and Map Reduce to read and analysis daily web portal activities like user clicks, sales and interested and reviews. Written complex logic to understand customer needs.
  • Used Hadoop Oozie to schedule the daily execution of business logic as validation tests. Used Hive to write sql queries. Implemented Partition and Query Poetization.
  • Worked on Sqoop to import data from third party structed data, after completion of data validation test processing, output will be stored on hive dw.
  • Designed Database, identified the attributes, created Master tables, Look tables and Transaction / Historical tables. Using ERWIN created table relation and eliminated redundant columns.
  • Identified the business need architect the ETL process, Analysis / Rule engine SP’s, Validation Test SP’s.
  • Used Informatics Power Center 8.6 for extraction, transformation and load (ETL) of data in the data warehouse.
  • Used Informatics Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
  • Used the PL/SQL procedures for Informatica mappings for truncating the data in Confidential tables at run time.
  • Extensively used Informatica debugger to figure out the problems in mapping. Also, involved in troubleshooting existing ETL bugs.
  • Used Data Science R studio to find correct Model and Predict Product sales, Quantity, Color for coming years. Applied Linear regression, Multiple Regression, Classification algorithms for data analysis and used Recommendation engine for Machine learning, resulted data shown on selected product and recommendations.
  • Used Data Science recommendation algorithms to recommend products based on all user’s interest.
  • To handle 1.5 trillion transaction records used table partition efficiently. Based Month and year unique number, query gets reads small partition of data instead of 1.5 trillion records. Used non-cluster indexes on this table to increase performance.
  • Worked on Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatics Power Center (Repository Manager, Designer, Workflow Manager, Workflow Monitor, Metadata Manger), Power Exchange, Power Connect as ETL tool on Oracle, DB2 and SQL Server Databases.
  • Created DB table schema/ design using best practices. Implemented Table partition, page compression, column store index.

Confidential, Chicago, IL

Senior BI/ ETL Architect

Environment: Oracle 9i, Informatica Power Center 8.1, Hadoop, MapReduce, Sqoop. SQL Server 2012, SSIS, SSRS, C# Scripting, HTML, SSAS, Table Partition, PL/SQL Developer, Bourne shell.


  • Designed Data Modeling based on business, created Conceptual, Logical and Physical Model. Designed table schemas, created relationships using ERWIN and Avoided data redundant.
  • Used SQL Server Isolation levels to maintain data accuracy / committed.
  • Tuned the performance of mappings by following Informatica best practices and applied several methods to get best performance by decreasing the run time of workflows.
  • Automated the Informatica jobs using UNIX shell scripting.
  • Used Hadoop MapReduce to analysis untrusted automobile company log files and browser history. Written complex logic to identify potential customers.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
  • Created procedures to truncate data in the Confidential before the session run.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Worked on Java spring to create Hadoop Map Reduce scripts. Implemented java libraries to use on ETL tools.
  • Implemented SQL performance, used Scale in and scale out approach. It help a lot to run applications smooth, without server issues. Increased server/ Memory capacity and kept production applications on independent servers.
  • Implemented Data ware house and Data Marts. Data Marts having Sales, Product Details, Vendor item selection / Purchase details. It will help Business to get the daily reports.
  • Created Complex SSIS jobs to Extract data from IBM DB2 and save on local sql server. Performed data transformation, logical condition to eliminate data.
  • On SSIS used variables to maintain data between tasks, Config file has all connection and server credentials.
  • Used C# script to perform complex business rules calculations and send mail based on red flag. Created mail on HTML format and send using SSIS Mail task.
  • Using SSRS reports created complex reports to show Products info, Product Allocation, created complex reports using Row/ Column grouping.
  • Have used BTEQ, FEXP, FLOAD, MLOAD Teradata utilities to export and load data to/from Flat files.
  • Implemented Sql performance using adding non cluster index on column where using on Where / joins. It increased performance. Avoided Cursors, used while loops.
  • Implemented SQL performance, used Scale in and scale out approach. It helps a lot to run applications smooth, without server issues. Increased server/ Memory capacity and kept production applications on independent servers.
  • Parameterized the mappings and increased the re-usability.

Confidential, Minneapolis, MN

Senior SQL Server SSIS, SSRS / BI Developer

Environment: Java, JSP, NetUI, Weblogic, JAX-RPC, JAX-WS web services and Axis server, Oracle, AmberPoint to monitor the services, X-Query for transforming the schemas.


  • Working on design, Architecture of application, development, Testing application, deployment (DEV, APP, SIT and Production servers) and requirement analysis of all the modules.
  • Communicate with Project Manager, Designers, Testers and Developers.
  • Used JIRA to log the change requests on the Design and Quality Center for logging the defects.
  • Used X-Query transformation to convert/transform the XML schemas from one type to another.
  • Used KODO (ORM Framework) to connect to the database in Object relational manner.
  • Developed the web services using JAX-WS and JAX-RPC.
  • Used XMLBeans to map java objects to XML document and vice versa.
  • Used Cascading Style Sheet (CSS) for develop the portal pages.
  • Developed the custom tags in the pages.
  • Developed the JavaScript’s for validating methods in the portal pages.
  • Used Generics,reflection,boxing and new java5 features to improve object oriented programming
  • Involved in the development of portal application using Page-flow, HTML/JSP portlets and NetUI framework.
  • Developed the multiple portal desktops for Internet and Intranet applications.
  • Worked on the LDAP to provide roles and permissions for users to access the portal pages.
  • Visitor entitlements have given in Portal admin console (Streaming mode).
  • Developed the Proxy and Business services using ESB. It is used key role in SOA as Integration/middle layer.
  • Developed the some of the process as WLI processes.
  • Created Data models and sequence diagrams.
  • Generated the Junit and Emma reports for validation classes.
  • Involved in the Database modeling for normalization process.
  • Developed the tactical services (reads data from XML files) using JAX-RPC and deployed those services in AXIS server for consumption.

Confidential, Boston, MA

Java Web developer

Environment: - Java, Oracle, WebSphere, Clear Case, Clear Quest, Java script, Ajax controls.


  • Developed the Web service client project to validate the addresses using Finalist software.
  • Implemented the Connection pooling for the batch projects using DBCP.
  • Developed the web services for all forms in the project using JAX-WS RI framework.
  • Involved in the data base schema design and normalization process.
  • Generated the Junit and Emma reports for the validation classes.
  • Developed the DAO’s using FAST4J ORM tool to connect to database.
  • Developed the CCMS business rules using Drools API.
  • Developed the custom utility methods to track the all queries in the application to separate log file using log4J.
  • Developed the batch process to handle mass mail and email.
  • Involved in the Code review of all modules in the project.
  • Involved in the writing the SQL queries for the project.

Hire Now