- Experienced and self - motivated Big Data Developer, with over 5years of extensive experience in Big
- Data Ecosystem and Enterprise Data Warehouse Systems. Agile development experience with proven track record of Successful Implementations.
- Experience in using various Hadoop infrastructures particularly Spark, Pig, Hive, Oozie, Sqoop, Flume, Zookeeper, Impala, SQL Database, HUE, Flume-ng and YARN.
- Excellent knowledge in Hadoop Ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce.
- Experience in working with MapReduce programs using Apache Hadoop for working with BigData.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNodes and MapReduce concepts.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
- Hands on experience in writing Pig Latin scripts and pig commands.
- Thorough knowledge of Monitoring, Replication and Sharding Techniques in MongoDB.
- Experienced with designing Partitions in Cubes to improve performance-using SSAS
- Strong knowledge in working with UNIX/LINUX environments, writing shell scripts and PL/SQL Stored Procedures.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
- Experience in designing and maintaining high performing ELT/ETL Processes.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Committed hardworking individual with strong communication and organizational skills.
- Ability to adapt evolving technology, strong sense of responsibility and .
Big Data Developer
Confidential, Indianapolis, IN
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including MapReduce, Hive and Spark.
- Prepared Linux shell scripts for automating the process and implemented Impala for data analysis.
- Implemented batch processing of data sources using Apache Spark.
- Executed Spark RDD transformations and actions as per business analysis needs.
- Migrated Hive queries into Spark QL to improve performance.
- Elaborated predictive analytic using Apache Spark Scala APIs.
- Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text, Avro, Sequence, Xml, JSON and Parquet).
- Established PIG Latin scripts for the analysis of semi-structured data.
- Performed optimization on Pig scripts and Hive queries to increase efficiency and add new features to existing code.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Imported data from MySQL to HDFS using Sqoop and manage Hadoop log files.
- Experienced in data cleansing and processing using Pig Latin operations and UDFs.
- Wrote Hive Scripts for analyzing data in Hive warehouse using HiveQL.
- Collected logs data from various sources and integrating into HDFS using Flume.
- Ran Hadoop streaming jobs to process terabytes of Xml format data.
Environment: HDFS, CDH 5.1.2, Apache Spark 1.4.0, Pig, Hive, Sqoop, SQL, Shell scripting, Java 7.0, Oracle 10g/11g.
Hadoop/ Spark Developer
Confidential, Albany, NY
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Explored Spark to improve the performance and optimization using Spark context, Spark- SQL, Data Frame, pair RDDs, Spark YARN of the existing algorithms in Hadoop.
- Installed/Configured/Maintained Hortonworks Hadoop clusters for application development.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Developed and executed shell scripts to automate the jobs and Wrote complex Hive queries and UDFs.
- Worked on reading multiple data formats on HDFS using Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata. \Analyzed the SQL scripts and designed the solution to implement using Spark.
- Involved in loading data from UNIX file system to HDFS, AWS S3.
- Extracted the data from Teradata into HDFS using Sqoop.
- Handled importing of data from various data sources like AWS S3, MongoDB performed transformations using Hive, MapReduce, Spark and load data into HDFS.
- Manage and review Hadoop log files.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive using AWS EMR.
- Using Atlas exchange of metadata with MariaDB to Hive.
- Facilitating the daily scrum meetings, spring planning, spring review, and spring retrospective.
- Worked on the core and Spark SQL modules of Spark extensively.
- Experienced in running Hadoop streaming jobs to process terabytes data from AWS S3.
- Implemented Oozie job for importing real-time data to Hadoop using Kafka and for daily imports.
Environment: Hadoop, HDFS, Hive, Scala, Spark, SQL, MongoDB, MariaDB, UNIX Shell Scripting, AWS S3, EMR, Hortonworks HDP 2.5, Hadoop Stack, Apache Ranger and Apache Atlas.
Confidential, Lexington, KY
- Involved in loading data from LINUX file system to HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Load and transform large sets of structured, semi structured and unstructured data
- Experience in managing and reviewing Hadoop log files, managing and scheduling Jobs on a Hadoop cluster.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
- Written Hive queries for data analysis to meet the business requirements
- Involved in writing Hive scripts to extract, transform and load the data into Database.
- Used JIRA for bug tracking and used CVS for version control.
Environment : Hadoop, Hive, Linux, MapReduce, HDFS, Pig, Sqoop, Shell Scripting, Python, Java (JDK 1.6), Java 6, Eclipse, Control-M scheduler, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Linux, JIRA 5.1, CVS, JIRA 5.2.
- Configured data to provide persistence services and persistent objects to the application from the database using Hibernate ORM tool as persistence layer.
- Developed DAO layer using Spring MVC configuration XML's for Hibernate and to manage CRUD operations like insert, update and delete.
- Implemented reusable services using BPEL to transfer data.
- Developed dependency injection for Spring framework.
- Developed a Web based reporting system with JSP, DAO and Apache Struts-Validator using Struts framework.
- Designed the controller using Servlets.
- Developed Junit classes and created Junit test cases.
- Configured logging (enable/disable) using log4j for the application.
- Created user interface using HTMP, CSS, JSP, JQuery, AJAX, Java Scrpit and JSTL.
- Implemented database operations using PL/SQL procedures and queries.
- Developed shell scripts for UNIX environment to deploy EAR and read log files.
- Implemented log4j for logging.
- Involved in designing and developing modules at both Client and Server Side
- Interacted with external services to get the user information using SOAP web service calls
- Developed web components using JSP, Servlets and JDBC.
- Technical analysis, design, development and documentation with a focus on implementation and agile development.
- Accessed backend database Oracle using JDBC.
- Developed and wrote UNIX Shell scripts to automate various tasks.
- Developed user and technical documentation.
- Developed business objects, request handlers and JSPs for this project using Java Servlets and XML.
- Developed core spring components with some of the modules and integrated it with the existing struts framework.
- Actively participated in testing and designed user interface using HTML and JSPs.
- Implemented the database connectivity to Oracle using JDBC, designed and created tables using SQL.
- Implemented the server-side processing using Java Servlets.
- Installed and configured the Apache Web server and deployed JSPs and Servlets in Tomcat Server.