I have 9 years of IT experience in various domains with Big Data (Hadoop Eco Systems technologies), Core java and SQL&PL/SQL Technologies with hands - on project experience in various Verticals which includes financial services, Health Care and trade compliance.
- Around 9 years of IT experience in a variety of industries, which includes hands on experience of 4 years in Big Data Analytics and development.
- Good Knowledge and experience of System Development Life Cycle (SDLC), product development methodologies, database design concepts and system integration strategies.
- Expertize with the tools in Hadoop (HDFS, YARN) Ecosystem including Pig, Hive, Map Reduce, Sqoop, Storm, Spark, Flume, Kafka, Oozie, Impala and Zookeeper and Machine Learning.
- Strong experience in Spark SQL UDFs, Spark SQL Performance, Spark Streaming, Performance Tuning. Experienced in working with input file formats like orc, parquet, Sequence, Json, Avro.
- Experienced in working with Amazon Web Services (AWS) using EMR for computing and S3 as storage.
- Excellent Java development skills using Java, J2EE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
- Worked in large and small teams for Systems requirement, design & development.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
Big Data/Hadoop Technologies: HDFS, YARN, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, Oozie, Elastic Search
Hadoop Distribution/Monitoring: Cloudera, Hortonworks, Ambari, Cloudera Manager
NO SQL Databases: HBase, Cassandra, MongoDB
Relational Databases: Microsoft SQL Server, MySQL, Oracle, DB2
Languages: Java, Scala, SQL, PL/SQL, C, C++, Shell Scripting, Python
Java & J2EE Technologies: Core Java, JSP, Servlets, JDBC, JNDI, Hibernate, Spring, Struts, JMS, EJB, RESTful, SOAP
Web Technologies: HTML, CSS, XML, Java Script, JQuery
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Cloud Computing Tools: Amazon AWS: EC2, S3, IAM, Glacier, CloudFront, EMR
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
ETL Tools: Infomatica, Talend
Development Tools: Eclipse, NetBeans, IntelliJ
Development Methodologies: Agile/Scrum, Waterfall
Version Tools and Testing API: Git, SVN and JUNIT
Confidential, Chicago, IL
Senior Hadoop Developer
Environment: Spark core, Spark SQL, Spark streaming, Spark machine learning, Scala, S3, EMR, Kafka, Hive, Sqoop, Github, Web flow, Talend
- Load and transform large sets of Structured, Semi Structured and Unstructured data coming from different source systems and a variety of portfolios.
- Wrote Shell Scripts to move batch files from various sources to HDFS, S3 and scheduled using cron tab.
- Worked on incremental Batch Ingestion using Sqoop into HDFS.
- Implemented Streaming data Ingestion using Kafka.
- Used Spark Streaming as a consumer in Kafka for preprocessing real time data.
- Implemented partitions and buckets in Hive for optimizing query performance.
- Used Spark core API and Spark SQL in Scala as per business analysis needs.
- Used AWS S3 for data storage and EMR cluster for processing various jobs.
- Worked on Talend with Hadoop. Worked in migrating from Talend jobs.
- Using Spark Machine learning library to clean data and feed the input stream data to Trained Model.
- Migrated python scikit learn machine learning to data frame based spark machine learning algorithms
- Worked withOozieWorkflow manager to schedule Hadoop jobs and high intensive jobs
- Worked with team members in data visualization using Tableau.
Hadoop with Spark Developer
Environment: Hadoop-HDFS, Hive, Sqoop, Kafka, Spark, Shell Scripting, HBase, Python, Zookeeper, Maven, Hortonworks, MySQL, Tableau
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Importing and exporting data into HDFS and Hive using Sqoop
- Created Linux shell Scripts to automate the daily ingestion of IVR data
- Good experience in handling data manipulation using python Scripts.
- Consumed the data from Kafka using Apache spark.
- Worked on POC’s with Apache Spark using Scala to implement spark in project.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive
- Extending Hive core functionality by using custom User Defined Function’s (UDF).
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Responsible for loading data files from various external sources like Oracle, MySQL into staging area in MySQL databases.
- Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
- Exported the aggregated data onto RDBMS using Sqoop for creating dashboards in the Tableau.
- Worked on Installation and configuring of Zookeeper to co-ordinate and monitor the cluster resources.
Confidential, San Diego, CA
Environment: Hadoop, Cloudera Manager, Teradata, Map Reduce, HBase, SQL, Sqoop, HDFS, Flume, UML, Hive, Oozie, Cassandra, Maven, Pig, UNIX, Python, and Git.
- Primary responsibilities include building scalable distributed data solutions using Hadoop Ecosystem
- Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
- Developed Map Reduce programs for preprocessing and cleansing the data in HDFS obtained from Heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Created Map Reduce programs to handle semi/unstructured data like xml, Json, csv and sequence file format for log files.
- Experience in troubleshooting in Map Reduce jobs by reviewing log files.
- Implemented Hive Generic UDF to implemented business logic around custom data types
- Conducted data extraction that may include analyzing, reviewing, modeling based on requirements using higher Level Tools such as Hive and Pig.
- Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in MapReduce way.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Developed Oozie workflow for scheduling and for Visualization we used Tableau.
- Coordinating with Project Manager for getting the requirements and developing the code to support new applications.
- Developing Web applications using Java, J2EE, Struts and Hibernate.
- Developing Action Form classes, Form beans, and Action classes using struts.
- Using Hibernate for the backend persistence.
- Involved in writing Spring Configuration XML file that contains object declarations and dependencies.
- Implementing MVC, DAO J2EE design patterns as a part of application development.
- Coding and maintaining Oracle packages, stored procedures and tables.
- Working on Web technologies including Tomcat, Apache, Http, Web service architectures.
- Using SVN for software configuration management and version control.
- Preparing test cases and strategies for unit testing and integration testing.
- Using the LOG4j to log regular Debug and Exception statements.
Jr. Java Developer
Environment: Java 5, Struts, PL/SQL, SQL server, EJB, IntelliJ, Clear Case, Apache Tomcat, JSP, CSS.
- Developed Action Servlets, Action Form, Java Bean classes for implementing business logic for the Struts Framework.
- Developed Servlets and JSP based on MVC pattern using struts Action framework.
- Developed all the tiers of the J2EE application. Developed data objects to communicate with the database using JDBC in the database tier, implemented business logic using EJBs in the middle tier.
- Developed persistence layer modules using EJB Java Persistence API (JPA) annotations and Entity manager.
- Developed Action and Form Bean classes to retrieve data and process server-side validations.
- Designed various tables required for the project in SQL server database and used Stored Procedures in the application. Used SQL Server to create, update and manipulate tables.
- Used IntelliJ as IDE and Clear Case for version control.