- Over 9 years of IT experience in various domains with Hadoop Ecosystems and Java J2EE technologies.
- Very good hands - on in Spark Core,Spark Sql,Spark Streaming and Spark machine learning using Scala and Python programming language.
- Solid understanding of RDD operations in Apache Spark i.e., Transformations & Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimising Broadcasts.
- In depth understanding of Apache spark job execution Components like DAG, Lineage graph, Dag Schedular, Task schedular, Stages and task.
- Experience in exposing Apache Spark as web services.
- Good understanding of Driver,Executor Spark web UI.
- Experience in submitting Apache Spark job and map reduce jobs to YARN.
- Experience in real time processing using Apache Spark and Kafka.
- Migrated Python Machine learning modules to scalable,high performance and fault-tolerant distributed systems like Apache Spark.
- Strong experience in Spark SQL UDFs,Hive UDFs, Spark SQL Performance, Performance Tuning.Hands on experience in working with input file formats like orc, parquet, json, avro.
- Good expertise in coding in Python,Scala and Java.
- Good understanding of the mapreduce framework architectures (MRV1 & YARN Architecture).
- Good Knowledge and understanding of Hadoop Architecture and various components in Hadoop ecosystems - HDFS, Map Reduce, Pig, Sqoop and Hive.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Hands on experience in cleansing semi-structured and unstructured data using Pig Latin scripts
- Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet the business requirements.
- Experience in managing and reviewing Hadoop log files.
- Having good working experience of No SQL database like Cassandra and MangoDB
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Experience in working with flume to load the log data from multiple sources directly into HDFS
- Experience in scheduling time driven and data driven Oozie workflows.
- Used Zookeeper on a distributed Hbase for cluster configuration and management.
- Worked with Avro Data Serialization system.
- Experience in fine-tuning Mapreduce jobs for better scalability and performance.
- Experience in writing shell scripts do dump the shared data from landing zones to HDFS.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Expertise in Client Side designing and validations using HTML and Java Script.
- Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.
Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, MapReduce, Cloudera, Mongo DB.
Bigdata distribution: Cloudera, Amazon EMR
Programming languages: Core Java, Scala, Python,SQL, Shell Scripting
Operating Systems: Windows, Linux (Ubuntu)
Databases: Oracle, SQL Server
Designing Tools: Eclipse
Java Technologies: JSP, Servlets, Junit, Spring,Hibernate
Linux Experience: System Administration Tools, Puppet, Apache
Web Services: Web Service (RESTful and SOAP)
Frame Works: Jakarta Struts 1.x, Spring 2.x
Development methodologies: Agile, Waterfall
Logging Tools: Log4j
Application / Web Servers: Cherrypy,Apache Tomcat,Websphere
Messaging Services: ActiveMQ, Kafka,JMS
Version Tools: Git, SVN and CVS
Analytics: Tableau, SPSS, SAS EM and SAS JMP
Confidential, Peoria, Illinois
Hadoop/ Spark developer
- Used Pysaprk dataframe to read text data, CSV data,image data from HDFS, S3 and Hive.
- Worked closely data scienctist for building predictive model using Pyspark.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Cleaned input text data using Pyspark Machine learning feature exactions API.
- Created features to train algorithms.
- Used various algorithms of Pyspark ML API.
- Trained model using historical data stored in HDFS and Amazon S3.
- Used Spark Streaming to load the trained model to predict on real time data from kafka.
- Stored the result in MongoDB .
- Web application can picks data which is stored in MongoDB.
- Used Apache Zeppelin to vizualization of Big Data.
- Fully automated job scheduling, monitoring, and cluster management without human.
- Intervention using webflow.
- Build apache spark as Web service using flask.
- Migrated python scikit learn machine learning to data frame based spark machine learning algorithms.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
Environment: Spark core, SparkSQL, Spark streaming, Spark machine learning, Python, Scikit learn, Pandas dataframe, AWS, Kafka, Hive, MongoDB, Github, Webflow, Amazon s3, Amazon EMR .
Confidential, Charlotte, North Carolina
- Imported data from our relational data stores to Hadoop using Sqoop.
- Created various Mapreduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Wrote PIG scripts and executed by using Grunt shell.
- Worked on the conversion of existing MapReduce batch applications for better performance.
- Big data analysis using Pig and User defined functions (UDF).
- Worked on loading tables to Impala for faster retrieval using different file formats.
- The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
- Created Reports and Dashboards using structured and unstructured data.
- Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
- Used HIVE definition to map the output file to tables.
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Wrote data ingesters and map reduce programs
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance
- Wrote MapReduce/HBase jobs
- Worked with HBase, NOSQL database.
Environment: ApacheHadoop 2.x, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Flume, Linux, Java 7, Eclipse, NOSQL.
Confidential, Salt lake City, Utah
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Installed and configured Apache Hadoop, Hive, and HBase.
- Worked on Hortonworks cluster, which was used to process the big data.
- Developed multiple map reduce jobs in java for data cleaning and pre-processing.
- Sqoop was used to pull data into Hadoop distributed file system from RDBMS and vice versa
- Defined workflows using Oozie.
- Used Hive to create partitions on hive tables and analyzes this data to compute various metrics for reporting.
- Created Data model for Hive tables
- Good Experience in managing and reviewing Hadoop log files
- Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
- Worked on large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Installed and configured Hive and also developed Hive UDFs to extend core functionality of hive
- Responsible for loading data from UNIX file systems to HDFS.
Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, HBase, Pig, Oozie,Linux, Java 7, Eclipse.
Confidential, Bridgewater, NJ
Sr. Java Developer
- Full life cycle experience including requirements analysis, high level design, detailed design, UMLs, data model design, coding, testing and creation of functional and technical design documentation.
- Used Spring Framework for MVC architecture with Hibernate to implement DAO code and also used Web Services to interact other modules and integration testing.
- Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
- Designed database and involved in developing SQL Scripts.
- Used SQL navigator as a and involved in testing the application.
- Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
- Used Clear case, and also subversion for maintaining the source version control.
- Wrote Ant scripts to automate the builds and installation of modules.
- Involved in writing Test plans and conducted Unit Tests using JUnit.
- Used Log4j for logging statements during development.
- Design and implementation of log data indexing and search module, and optimization for performance and accuracy. To provide a full text search capability for archived log data, utilizing Apache Lucene library.
- Involved in the testing and integrating of the program at the module level.
- Worked with production support team in debugging and fixing various production issues.
Environment: s: Java 1.5,AJAX,XML,Spring3.0,Hibernate2.0,Struts1.2,Webservices,Websphere7.0,Junit,Oracle10g,SQL, PL/SQL, log4j, RAD 7.0/7.5, Clear case, Unix, HTML, CSS, Java script.
- Worked with the business community to define business requirements and analyze the possible technical solutions.
- Requirement gathering, Business Process flow, Business Process Modeling and Business Analysis.
- Extensively used UML and Rational Rose for designing to develop various use cases, class diagrams and sequence diagrams.
- Developed application using Spring MVC architecture.
- Developed custom tags for table utility component
- Used various Java, J2EE APIs including JDBC, XML, Servlets, and JSP.
- Designed and developed web pages using Servlets and JSPs and also used XML/XSL/XSLT as repository.
- Involved in Java application testing and maintenance in development and production.
- Involved in developing the customer form data tables. Maintaining the customer support and customer data from database tables in MySQL database.
- Involved in mentoring specific projects in application of the new SDLC based on the Agile Unified Process, especially from the project management, requirements and architecture perspectives.
- Designed and developed Views, Model and Controller components implementing MVC Framework.