- Over 6 years of professional IT experience including 2 years on Big data ecosystems and 4 years on software development with continuous work experience in Java.
- Technical expertise in financial services, banking, and technologies.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce, High Availability and YARN architecture and good understanding of workload management, schedulers, scalability and distributed platform architectures.
- Proficient in Java, and Scala in Apache Spark.
- Technical expertise in Big data/Hadoop HDFS, Map Reduce, Spark, Hive, Pig, Sqoop, Flume, Oozie, Kafka, NoSQL.
- Experience in developing MapReduce jobs with Java API in Hadoop.
- Experience in developing Spark applications using Scala.
- Extensive experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems (RDBMS) and vice versa.
- Experience in collecting, aggregating and moving large amounts of streaming data using Flume, Kafka, RabbitMQ, Spark Streaming.
- Extensive experience in writing Pig scripts and Hive Queries for processing and analyzing large volumes of data structured in different level.
- Strong experience in writing custom UDFs in Java for HIVE and Pig to extend the functionality.
- Exposure on the HBase distributed database and the ZooKeeper distributed configuration service.
- Strong in core java, data structure, algorithms design, Object - Oriented Design(OOD) and Java components like Collections Framework, Exception handling, I/O system.
- Strong Database Experience on RDBMS (SQL Server, MySQL) with PL/SQL programming skills in creating Packages, Stored Procedures, Functions, Triggers & Cursors.
- Strong experience in M achine L earning and D ata M ining by using R.
- Experience in data visualization using Tableau, Talend, SSIS, Qlik Sense, and Microstrategy.
- Experience in Agile, Waterfall, and Scrum Development environments by using Git, JIRA and Jenkins.
Big Data Eco-system \ Languages:: Hadoop 2.6.3, Spark 1.6.20, MapReduce 1.0 \ Java 15/1.6/1.7/1.8 , Scala 2.11.5 YARN, Hive 2.0.0, Pig 0.15.0, Hbase 1.1.4, \ R, SQL, HiveQL, Pig-Latin, Flume 1.5.0, Sqoop 1.4.6, ZooKeeper 3.5.2, Kafka 0.10.1.1, RabbitMQ 3.6.6, Oozie 4.2
NoSQL Databases: Couchbase, Cassandra, HBase
Relational Databases: MySQL 5.6.x, SQL Server 2005/2008/2012
Business Intelligence: Oracle 11g/10g/9i/, PostgreSQL 8.0\ Tableau, Talend, SSIS, Qlik Sense, Microstrategy
Web Technologies: J2EE (Servlets, JSP, Struts 2, Spring 4.0-
Others: JDBC, ODBC, Hibernate 4)\ Cloudera, Distributed Hadoop, Docker,HTML/CSS, XML\ IntelliJ IDEA, Eclipse IDE, NetBeans 8.1
Confidential, San Diego, CA
Big Data Engineer
- Involve in Scrum meeting and release, working closely with teammates and managers.
- Developed RESTful API to provide micro services by using Spring Boot framework.
- Build services based on Docker/CoreOS/Mesosphere eco-system.
- Performed unit testing using Junit and sanity testing using Postman.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Couchbase.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Used Couchbase for scalable storage and fast query.
- Involved in application performance tuning and troubleshooting.
- Used Git for version control and JIRA for project tracking.
Environment: Co uchbase 4.6.3, Spark 1.6.2, Sqoop 1.4.6, Flume 1.5.0, HBase 1.1.4, MySQL 5.6, Scala 2.11.x, Kafka 2.1, Spring Boot, IntelliJ, Agile/Scrum
Confidential, New York City, NY
Senior Big Data Developer
- Extensively involved in installation and configuration of Cloudera D istribution Hadoop platform.
- Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with DataFrames in Spark.
- Utilized SparkSQL to extract and process data by parsing using Datasets or RDDs in HiveContext, with transformations and actions (map, flatMap, filter, reduce, reduceByKey).
- Extended the capabilities of DataFrames using U ser D efined F unctions in and Scala.
- Resolved missing fields in DataFrame rows using filtering and imputation.
- Integrated visualizations into a Spark application using Databricks and popular visualization libraries (ggplot, matplotlib).
- Trained analytical models with Spark ML estimators including: linear regression, decision trees, logistic regression, and k-means.
- Performed pre-processing on a dataset prior to training, including: standardization, normalization.
- Created pipelines to create a processing pipeline including transformations, estimations, evaluation of analytical models.
- Evaluated model accuracy by dividing data into training and test datasets and computing metrics using evaluators.
- Tuned training hyper-parameters by integrating cross-validation into pipelines.
- Computed using Spark MLlib functionality not present in SparkML by converting DataFrames to RDDs and applying RDD transformations and actions.
- Troubleshot and tuned machine learning algorithms in Spark.
Environment: Spark 1.6.2, Spark Mllib, Spark ML, Hive 1.2.1, Sqoop 1.4.6, Flume 1.5.0, HBase 1.1.4, MySQL 5.6, Scala 2.11.x, Shell Scripting, Tableau 9.2, Agile
Confidential, Boston, MA
Big Data Engineer
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Experienced on loading and transforming of large sets of structured and semi structured data.
- Managed and Reviewed Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
- Exported filtered data into HBase for fast query.
Environment: Hadoop, HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java
- Involved in system design, which is based on Spring Struts Hibernate framework.
- Implemented the business logic in standalone Java classes using core Java.
- Developed database (SQL Server) applications.
- Worked in Spring Hibernate Template to access the SQL Server database.
- Design, implementing, and test new features by using T-SQL programming.
- Optimize existing data aggregation and reporting for better performance.
- Perform varied analyses to support organization and client improvement.
- Designed and coded application components with JSP, Servlet and AJAX.
- Implemented data persistency using JDBC for database connectivity and Hibernate for database/java object mapping.
- Designed the logical and physical data model, generated DDL, DML scripts.
- Wrote SQL queries, stored procedures and database triggers as required on the database objects.
Environment: Java, XML, Hibernate, SQL Server, Maven2, JUnit, J2EE (JSP, Java beans, DAO), Eclipse, Apache Tomcat Server, Spring MVC, Spiral Methodology
- Developed the data parsing system on XML.
- Developed the system UI using Java Swing.
- Developed with Struts /Hibernate frameworks as MVC layer.
- Developed SQL queries using Oracle database.