- Over 8+ years of experience in overall IT, which includes hands on experience in Big Data technologies.
- 3 years of experience in installing, configuring, and using Hadoop ecosystem components like MapReduce, HDFS, Hive, Sqoop, Pig, Zookeeper, Oozie and Flume.
- Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Good understanding of HDFS Designs, Daemons, HDFS high availability (HA).
- Expertise in data transformation & analysis using PIG, HIVE and SQOOP.
- Experienced in implementing SPARK using Scala and Python.
- Configured Zoo Keeper, Cassandra & Flume to the existing Hadoop cluster.
- Strong understanding of Cassandra database.
- Experience in importing and exporting data using Sqoop from HDFS & Hive to Relational Database Systems and vice - versa.
- Worked on NoSQL databases including HBase and MongoDB.
- Experienced in coding SQL, PL/SQL, Procedures/Functions, Triggers and Packages on database (RDBMS) packages like Oracle.
- Experience in implementation of Open-Source frameworks like Struts, Spring, Hibernate, Web Services etc.
- Extensive experience working in Oracle and My SQL database.
- Familiar with data architecture including data ingestion pipeline design, Hadoop architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- Good Knowledge on Spark, Storm and HBase to do real time streaming.
- In-depth knowledge of Statistics, Machine Learning, Data mining.
- Experienced with big data machine learning in Mahout and Spark MLlib.
- Experienced with data cleansing in writing Map Reduce jobs and Spark jobs.
- Experienced with statistic tools Matlab, R and SAS.
- Experienced supervised learning techniques like Multi-Linear Regression, Nonlinear Regression, Logistic Regression, Artificial Neural Networks, Support Vector Machine, Decision tree, Random Forest.
- Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
- Excellent communication skills, interpersonal skills, problem-solving skills, a very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
Big Data: Hadoop, HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Spark, Sqoop, Oozie, Zookeeper, Impala, Flume
IDE Tools: Eclipse, NetBeans
Java Technologies: Core Java, Servlets, JSP, JDBC, Collections
Databases: Oracle, MySQL, DB2, PostgreSQL, MongoDB
Operating Systems: WindowsXP/Vista/7, Mac OSX, Linux, Unix
Logging tools: Log4j
Version Control System: SVN, CVS, GIT
Other Tools: Putty, WINSCP
Confidential, New York, NY
- Built a suite of Linux scripts as a framework for easily streaming data feeds from various sources onto HDFS.
- Wrote Interface specifications to ingest structured data into appropriate schemas and tables to support the rules and analytics.
- Extracted the data from Teradata into HDFS using Sqoop and exported the patterns analyzed back into Teradata using Sqoop.
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
- Used hive schema to create relations in pig using HCatalog.
- Developed a Java MapReduce and pig cleansers for data cleansing.
- Analyze and visualized data in Teradata using Datameer.
- Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala.
- Implemented Machine learning Models like K-means clustering using PySpark.
- Used Spark to create reports for analysis of the data coming from various sources like transaction logs.
- Involved in migration from Hadoop System to Spark System.
- Refactored formal Hive queries to Spark SQL.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Used Oozie Operational Services for batch processing and scheduling work flows dynamically.
- Used Maven extensively for building jar files of MapReduce programs and deployed to cluster.
- Managed Agile Software Practice using Rally by creating Product Backlog, Iterations and Sprints in collaboration with the Product Team.
Environment: - Hadoop, HDFS, Hive, Pig, MapReduce, YARN, Datameer, Flume, Oozie, Linux, Teradata, HCatalog, Java, Eclipse IDE, GIT.
Confidential, New York, NY
- Worked on writing various Linux scripts to stream data from multiple data sources like Oracle and Teradata onto the data lake.
- Built the infrastructure that aims at securing all the data in transit on the data lake. This helps in ensuring at most security of customer data.
- Extended Hive framework through the use of custom UDF to meet the requirements.
- Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
- I have taken the initiative to learn eCDW, an AT&T custom scheduler to schedule periodical run of various scripts for initial and delta loads for various datasets.
- Played a key role in mentoring the team on developing MapReduce jobs and custom UDFs.
- I played an instrumental role in working with the team to leverage Sqoop for extracting data from Teradata.
- Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
- Developed job flows in TWS to automate the workflow for extraction of data from Teradata and Oracle.
- Was actively involved in building the Hadoop generic framework to enable various teams to reuse some of the best practices.
- Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
- Helped the team in optimizing Hive queries.
- A massive cluster with High Availability of Name Nodes.
- Cluster to support 400 TB of data keeping growth rate into consideration.
- Secured cluster with Kerberos.
- Used LDAP for role based access.
- Apache Hadoop, MapReduce, HDFS, Hive, Sqoop, Linux, JSON, Oracle11g, PL/SQL, Eclipse, SVN, Teradata Client, TWS (Tivoli Work Scheduler).
Confidential, Louisville, KY.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in start to end process of hadoop cluster installation, configuration and monitoring.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Setup and benchmarked Hadoop, HBase clusters for internal use.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Used UDF's to implement business logic in Hadoop.
- Used Hive for managing customer and franchise information in the tables.
- Imported data using Sqoop into Hive and HBase from existing DB2.
- Used Ganglia to monitor the distributed cluster network.
- Used Jira for bug tracking.
- Used Git to check-in and checkout code changes.
Environment: Java, Hadoop, HDFS, MapReduce, PIG, Sqoop, Hive, HBase, DB2, Eclipse.
Confidential, Jacksonville, FL
- Involved in requirements gathering and creating functional specifications by interacting with business users.
- Responsible for analysis, design, development and unit testing.
- Implemented the application using Spring MVC. Used Spring Batch for handling the parallel processing of batch jobs.
- Followed MVC and DAO as the design patterns for developing the application. Any database calls are handled only through DAO’s.
- Improved performance of application using Hibernate for retrieving, manipulating and storing millions of records.
- Unit and Integration testing before check in the code for the QA builds.
- Involved in Production Support.
- Used Log4j and commons-logging frameworks for logging the application flow.
- Involved in developing build scripts using Ant.
- Used CVS for Version controlling.
Confidential, Dublin, OH
Jr. J2EE Developer
- Worked with Onsite and Offshore team to coordinate the knowledge transfer and work.
- Developed session beans by using EJB's for business logic at the middle tier.
- Written SQL and stored procedure to extract data model from Oracle enterprise data.
- Written various Java classes for registrations of users.
- Used JDBC API to access database.
- Involved in the generation of reports.
- Performed build releases planning and co-ordination for QA testing and actual deployment.
- Participated in unit testing (using JUnit) and integration testing.
Environment: s: Java, JDBC, XML, log4j, Ant, Oracle 9i, TOAD, Solaris, AIX, Windows.