- Over 7 years of experience in Information Security and Enterprise with most recent experience in Big Data technologies Software, design and edifice high performance scalable systems using Big Data and implementations and testing of various client server based applications using Java and J2EE technologies.
- Excellent understanding of Hadoop and underlying framework including storage management. Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Expertise in data analysis, design and modeling using tools like ErWin.
- Expertise in Big Data like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
- Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
- Experienced in Collected logs data from various sources and integrated in to HDFS using Flume and experience in developing custom UDFs for Hive.
- Experienced in testing data in HDFS and Hive for each transaction of data.
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper, Good working knowledge of Amazon Web Service components like EC2, EMR, S3, Good experience in Shell programming.
- Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters, Knowledge and experience of functionality of NOSQL DB like Cassandra and Mongo DB.
- Experienced in application development using Java, J2EE, JDBC, spring, Junit.
- Experienced to develop enterprise applications with J2EE/MVC with application servers and Web servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.
- Strong Experience in working with Databases like Oracle 11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Excellent understanding of Hadoop and underlying framework including storage management.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms, Strong expertise on Amazon AWS EC2, S3, Kinesis and other services
- Expertise in data analysis, design and modeling using tools like ErWin.
Hadoop/Big Data: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Scala, Akka, Kafka, Storm, Mongo DB
Languages: Java, J2EE, PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark
Java/J2EE Technologies.: JDBC, Java Script, JSP, Servlets, JQuery
Databases: Oracle 12c/11g/10g/9i, Microsoft Access, MS SQL
No SQL Databases: Cassandra, mongo DB
Web/Application servers: Apache Tomcat6.0/7.0/8.0, JBoss
Frameworks: MVC, Struts, Spring, Hibernate.
Operating Systems: UNIX, Ubuntu Linux and Windows, Centos, Sun Solaris.
Network protocols: TCP/IP fundamentals, LAN and WAN.
Confidential - Princeton, NJ
Sr. Big Data/Hadoop Developer
- Contributing to the development of key data integration and advanced analytics solutions leveraging Apache Hadoop and other big data technologies for leading organizations using major Hadoop Distributions like Hortonworks and Cloudera.
- Working on Amazon AWS - EMR, EC2, RDS, S3, RedShift, etc., Tools- Hadoop, Hive, Pig, Sqoop, Oozie, Hbase, Flume, Spark.
- Working on loading log data directly into HDFS using Flume in Cloudera - CDH. Involve in loading data from LINUX file system to HDFS in Cloudera - CDH.
- Experience in running Hadoop streaming jobs to process terabytes of xml format data. Experience in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP in Cloudera.
- Installed and configured MapReduce, HIVE and the HDFS. Developing Spark scripts by using Java per the requirement to read/write JSON files. Working on Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on Hadoop Administration, development, NoSQL in in Cloudera Load and transform large sets of structured, semi structured and unstructured data.
- Involve in creating Hive tables, loading with data and writing hive queries which will run internally in map. Automate all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Create reports for the BI team using Sqoop to export data into HDFS and Hive, Configure and install Hadoop and Hadoop ecosystems (Hive/Pig/ HBase/ Sqoop/ Flume).
- Designed and implemented a distributed data storage system based on HBase and HDFS. Importing and exporting data into HDFS and Hive.
- Design & Implement Data Warehouse creating facts and dimension tables and loading them using Informatica Power Center Tools fetching data from the OLTP system to the Analytics Data Warehouse. Coordinating with business user to gather the new requirements and working with existing issues, worked on reading multiple data formats on HDFS using Scala. Loading data into parquet files by applying transformation using Impala. Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
- Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala, Analyzed the SQL scripts and designed the solution to implement using Scala.
- Involved in Investigating any issues that would come up. Experienced with solving issues by conducting Root Cause Analysis, Incident Management & Problem Management processes.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Design and Development of Integration APIs using various Data Structure concepts, Java Collection Framework along with exception handling mechanism to return response within 500ms. Usage of Java Thread concept to handle concurrent request.
Environment: Hadoop 1.x/2.x MR1, Cloudera CDH3U6, HDFS, Spark, Scala, Impala, Hbase 0.90.x, Flume 0.9.3, Java, Sqoop 2.x, Hive 0.7.1, Tableau ( Online, Desktop, Public Vizable ).
Confidential - New York, NY
Sr. Big Data Developer
- Provide subject matter expertise and hands on delivery of Extract, Load and Transform on Hadoop Distribution platforms such as Hortonworks, Cloudera. Provide domain perspective on Hadoop Distribution tools usage.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts, Worked as a Hadoop consultant on (MapReduce/Pig/HIVE/Sqoop).
- Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce, Worked with Spark and Python.
- Developed Scala scripts, UDFFs using RDD/MapReduce in Spark 1.7 for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop, Worked extensively in creating MapReduce jobs using to power data for search and aggregation, Transferred the data using Informatica tool from AWS S3 to AWS Redshift.
- Experience in performance tuning a Cassandra cluster to optimize it for writes and read.
- Monitoring Cassandra cluster for resource utilization. Managing Cassandra clusters using Datastax opscenter.
- Involved in Designing and Developing Enhancements of CSG using AWS APIS.
- Experience on BI reporting with AtScale OLAP for Big Data.
- Analyzed system failures, identified its root cause and recommended course of actions.
- Responsible for importing log files from various sources into HDFS using Flume.
- Expert in performing business analytical scripts using Hive SQL.
- Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs, Experience in integrating oozie logs to kibana dashboard.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Escalated the issues in the existing problem/risk with the application.
- Successfully identified Problems with the data, produced derived data sets, tables, listings and figures, which analyzed the data to facilitate correction.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process the huge amount of structured data.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Created the system for single truth of source on hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Expertise in Data Development in Hortonworks HDP platform & Hadoop ecosystem tools like Hadoop, HDFS, Spark, Zeppelin, Hive, HBase, Sqoop, flume, SOLR, Pig, Oozie, Apache, Kafka.
Environment: Big Data, Spark, Zeppelin, AWS, Cloudera, EMR, JDBC, Redshift, NOSQL, Spark, YARN, HIVE, Pig, Scala, Python, Hadoop.
Confidential - Northbrook, IL
Big Data Developer
- Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
- Ingest data into Hadoop / Hive/HDFS from different data sources.
- Created Hive External tables to stage data and then move the data from Staging to main tables Utilize AWS services with focus on big data /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making. Experience in data cleansing and data mining.
- Design AWS, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
- All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts. Design of Redshift Data model, Redshift Performance improvements/analysis
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS. Implemented partitioning, dynamic partitions and buckets in HIVE.
- Developed customized classes for serialization and Deserialization in Hadoop
- Analyzed enormous amounts of data sets to determine optimal way to aggregate and report on it. Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
Environment: : Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.
Confidential, Sunnyvale, CA
JAVA/Big Data Developer
- Coding and integration of several business-critical modules of application using Java, Spring, Hibernate and REST web services on WebSphere application server.
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report on IBM WebSphere MQ messaging system, Generated XML Schemas and used XML Beans to parse XML files.
- Participated in JAD meetings to gather the requirements and understand the End Users System, Modified the existing JSP pages using JSTL.
- Deliver Big Data Products including re-platforming Legacy Global Risk Management System with Big Data Technologies such as Hadoop, Hive and HBase. Worked with NoSQL Mongo DB and heavily worked on Hive, Hbase and HDFS
- Developed Restful web services using JAX-RS and used DELETE, PUT, POST, GET HTTP methods in spring 3.0 and OSGI integrated environment.
- Worked on Restful web services which enforced a stateless client server and support JSON few changes from SOAP to RESTFUL Technology Involved in detailed analysis based on the requirement documents.
- Developed REST services to talk with adapter classes and exposed them to the angular JS front-end. Worked on Restful web services which enforced a stateless client server and support JSON few changes from SOAP to RESTFUL Technology Involved in detailed analysis based on the requirement documents. importing and exporting the stored web log data into HDFS and Hive using Scoop.
- Used Spring JDBC Dao as a data access technology to interact with the database.
- Used the light weight container of the Spring Frame work to provide flexibility for inversion of controller (IOC).
- Developed and Implemented new UI's using Angular JS and HTML.
- Developed Spring Configuration for dependency injection by using Spring IOC, Spring Controllers, Implementing Spring MVC and IOC methodologies.
- Used the JNDI for Naming and directory services.
Environment: Java, J2EE, Java SE 6, UML, JSP 2.1, Hadoop 1x, Hive, Pig, HBASE, JSTL 1.2, Servlets 2.5, Spring MVC, Hibernate, JSON, Restful Web services, Big Data, jQuery, AJAX, Angular Js, JAXB, IRAD Web sphere Integration Developer, Web Sphere 7.0, Eclipse Kepler-Maven, Serena Dimensions, Unix, JUnit, DB2, Oracle.
- Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, Design, Analysis and Code development.
- Developed hibernate mapping using db model.
- Implemented Model View Control (MVC) using Struts Framework and Spring framework, Involved in designing and developing Customized tags using JSP tag lib.
- Developed browser-based Java Server Faces front-end to an AS/400 system
- Used Ajax to provide dynamic features where applicable.
- Implemented RESTful web services to communicate with components of other Sourcing systems within the firm and to provide data to the reporting team.
- Used MVC pattern for GUI development in JSF and worked closely with JSF lifecycle, Servlets and JSPs are used for real-time reporting which is too complex to be handled by the Business Objects, Used Jira for bug tracking and project management.
- Development of database interaction code to JDBC API making extensive use of SQL Query Statements and advanced prepared statements.
- Developing front end of application using HTML, CSS, backbone.js, Java script, J Query.
- Used Angular JS framework where data from backend is stored in model and populated it to UI. Prepared user documentation with screenshots for UAT (User Acceptance testing).
- Hands on experience with MVC Java script frameworks such as Backbone.js, Angular.js and Node.js, Implemented server side tasks using Servlets and XML, and the Groovy Console.
- Developed Web services (SOAP) through WSDL in Apache Axis to interact with other components, Implemented EJBs Session beans for business logic.
- Used parsers like SAX and DOM for parsing xml documents and used XML transformations using XSLT, Wrote stored procedures, triggers, and cursors using Oracle PL/SQL.
- Implemented Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object and Service Locator.
- Interact with clients to understand their needs and propose design to the team to implement the requirement. Developed new modules using JSF 2.0 Framework.
- Built Angular JS framework including MVC, different modules, specific controllers’ templates, custom directives and custom filters.
- Extensively used XML, JSP, Java script, AJAX, Servlets to drive the application / request user input from backend.
Environment: Java, JSP, JDBC, Cassandra, API, Python, J query, Angular JS along with Web service, REST, Spring Core, Struts, Hibernate, Design Patterns, XML, Oracle, Apache Axis, ANT, Junit, UML, Web services, SOAP, XSLT, Jira.