- Over 6 years of experience in IT industry which around 4+ years of experience in Big Data in implementing complete Hadoop solutions.
- Working experience in using Apache Hadoop ecosystem components like Map Reduce, HDFS, Impala, Hive, Sqoop, Pig, Oozie, Flume, HBase, and Zoo Keeper.
- Strong experience in data analytics using Hive and Pig, including by writing custom UDFs.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Have written Spark applications using Python, Scala.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Knowledge of creating Map Reduce codes in Java as per the business requirements.
- Extensive knowledge in using SQL Queries for backend database analysis.
- Expertise in Core Java and Product Lifecycle Management tools.
- Experience in developing multi - tier JAVA based web application.
- Experience in developing spring Boot applications for transformations.
- Good Experience in developing applications using Java J2EE technologies includes Servlets, Struts, JSP, and JDBC.
- Well-versed in Agile, other SDLC methodologies and can coordinate with owners and SMEs.
- Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD). Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service, EC2, Redshift, Hands on experience with the AWS CLI and SDK tools.
- Strong knowledge of Software Development Life Cycle (SDLC).
- Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.
- Worked extensively on Health and Automotive Insurance domains.
- Experienced to work with multi-cultural environment with a team and also individually as per the project requirement.
Confidential, Bluefield, VA
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, SQOOP, flume, Spark, Cassandra with Hortonworks and Cloudera.
- Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
- Deploying Spark jobs in Amazon EMR and running the job on AWS clusters.
- Written Pig and Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. Also have hand on Experience on Pig and Hive User Define Functions (UFD).
- Execution of Hadoop ecosystem and Applications through Apache HUE.
- Optimizing Hadoop MapReduce code, Hive/Pig scripts for better scala Involved in migrating MapReduce jobs to Spark, using Spark SQL and DataFrames API to load structured data into Spark clusters bility, reliability and performance.
- Developed the OOZIE workflows for the Application execution.
- Feasibility Analysis (For the deliverables) - Evaluating the feasibility of the requirements against complexity and time lines.
- Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
- Writing Pig scripts for data processing.
- Developed a PySpark code for saving data into AVRO and Parquet format and building hive tables on top of them.
- Extensively involved in writing SQL queries (Sub queries, nested queries, views, Join conditions, removal of duplicates) in Impala/Hive, Oracle, and Spark SQL
- Used Amazon web services (AWS) like EC2 and S3 for small data sets.
- Implemented AWS services to provide a variety of computing and networking services to meet the needs of applications
- Strong Experience in implementing Data warehouse solutions in Amazon web services (AWS) Redshift; Worked on various projects to migrate data from on premise databases to AWS Redshift, RDS and S3.
- Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in Hive. Storing and retrieved data using HQL in Hive. Developed Hive queries to analyze reducer output data.
- Highly involved in designing the next generation data architecture for the unstructured data.
- Developed PIG Latin scripts to extract data from source system.
- Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop.
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
Environment: CDH4, AWS, EC2, EMR, HDFS, Map Reduce, Hive, Oozie, Java, Impala, PIG, Shell Scripting, Linux, HUE, Sqoop, Flume, DB2, and Oracle, Python, Scala, Java.
Confidential, Sunrise, FL
- Responsible for building scalable distributed data solutions using Hadoop
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
- Worked extensively with Flume for importing social media data
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce, loaded data into HDFS
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS
- Hands-on experience with productionalizing Hadoop applications viz. administration, configuration management, monitoring, debugging and performance tuning
- Involved in migrating MapReduce jobs to Spark, using Spark SQL and DataFrames API to load structured data into Spark clusters.
- Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
- Responsible for creating S3 buckets and managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup on AWS.
- Created HBase tables to store various data formats of PII data coming from different portfolios
- Processing of streaming data using STORM.
- Cluster co-ordination services through ZooKeeper
Environment: Hadoop, MapReduce, HDFS, Hive, Spark, Java, Python Scala, SQL, Cloudera Manager, Impala, Pig, Sqoop, Oozie, ZooKeeper, PL/SQL, MySQL, Windows, Oozie, HBase, AWS EMR, AWS S3
Confidential, Cincinnati, OH
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing
- Importing and exporting data into HDFS and Hive using Sqoop
- Used Multithreading, synchronization, caching and memory management
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Load and transform large sets of structured, semi structured and unstructured data
- Supported Map Reduce Programs those are running on the cluster
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
- Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs
- Utilized Java and MySQL from day to day to debug and fix issues with client processes
- Managed and reviewed log files
- Implemented partitioning, dynamic partitions and buckets in HIVE
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, CouchDB, Flume, HTML, XML, SQL, MySQL, J2EE, Eclipse
Confidential, Yonkers, NY
Java/ J2EE developer
- Responsible for understanding the scope of the project and requirement gathering.
- Review and analyze the design and implementation of software components/applications and outline the development process strategies
- Coordinate with Project managers, Development and QA teams during the course of the project.
- Used Spring JDBC to write some DAO classes to interact with the database to access account information.
- Using Spring Framework, Axis, developed web services including design of the XML request/response structure.
- Implemented Hibernate/Spring framework for Database and business layer.
- Configured Oracle with Hibernate, wrote hibernate mapping and configuration files for database processing (Create, Update, select) operations.
- Used Tomcat web server for development purpose.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Used CVS, Perforce as configuration management tool for code versioning and release.
- Developed application using Eclipse and used build and deploy tool as Maven.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Extensively used Core Java, Servlets, JSP and XML
- Involved in configuring and deploying of code to different environments Integration, QA and UAT.
- Involved in creation of Test Cases for JUnit Testing.
Environment: Java, J2EE, XML, Spring, Hibernate, Design Patterns, Log4j, CVS, Maven, Eclipse, Apache Tomcat, Junit, Oracle