- Around 8 years of total IT experience which includes Java Development, Web application Development, Database Management and Big Data ecosystem technologies.
- Around 4 years of Leveraged Strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, Map Reduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Cassandra, Hortonworks, Cloudera, Mahout, Avro and Scala.
- Hands - on experience with Hadoop applications such as Administration, management, monitoring, debugging, and performance tuning.
- Skilled programming in Map-Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Extracted and updated the data into MongoDB using Mongo import and export command line utility interface.
- Worked with assorted flavors of Hadoop distributions which includes Cloudera and Hortonworks.
- Pleasant Experience in working with cloud environment like Amazon Web Services (AWS) EMR, EC2, and S3.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop.
- Experience with Developing and Maintaining Applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Formation.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed spark application for filtering JSON source data in AWS S3 location and store it into HDFS with partitions and used spark to extract schema of Json files.
- Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Implemented POC's to migrate map reduce programs into Spark transformations using spark and Scala.
- Good knowledge on various scripting languages like Linux/Unix shell scripting and Python, continuous integration and automated deployment and management using Jenkins and Udeploy.
- Hands on experience on developing UDF, DATA Frames and SQL queries in Spark SQL.
- Proficient in Data Warehousing, Data Mining concepts and ETL transformations from source to target systems.
- Imported the data from various sources like AWS S3, local file system into Spark RDD.
- Experience with developing and maintaining applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Formation.
- Implementing database driven applications in Java using JDBC, JSON, XML API and using hibernate framework.
- Deployment, Distributed and Implementation of Enterprise applications in J2EE environment.
- Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirements Analysis, Design, Development and Testing.
- Proficiency in working with PL/SQL implementation on Data warehousing concepts and strong experience in implementing data warehousing methodologies.
- Experienced with Java Multithreaded programming to develop multithreaded modules and applications, Experience in Development of Multi-Tier distributed application using Java and Technologies.
- Involved in the Software Life Cycle phases like AGILE and Waterfall estimating the timelines for projects.
Big Data Skillset - Frameworks & Environments: Cloudera CDHs, Hortonworks HDPs, Hadoop1.0, Hadoop2.0, HDFS, MapReduce, Pig, Hive, Impala, HBase, Data Lake, Cassandra, MongoDB, Mahout, Sqoop, Oozie, Zookeeper, Flume, Splunk, Spark, Storm, Kafka, YARN, Falcon, Avro.
Amazon Web Services(AWS): Amazon Web Services, Elastic Map Reduce (EMR 4.1.0) cluster, EC2 Instances, Airflow, Amazon S3, Amazon Redshift, Ganglia, EMRFS, s3cmd(Batches), Ruby EMR Utility(monitoring), Kinesis(streams), AWS Code Commit, AWS Code Deploy, AWS Code Pipeline, Amazon Cloud Front, AWS Import/Export.
IDE Tools: Eclipse, Net Beans, Spring Tool Suite, Hue (Cloudera specific).
Databases & Application Servers: Oracle, MySQL, DB2, Cassandra, MongoDB, HBase, MongoDB, Database Technologies MySQL, Oracle 8i, 9i, 11i & 10g, MS Access, Teradata, Microsoft SQL-Server 2000 and DB2 8.x/9.x, PostgreSQL.
Other Tools: Putty, WinSCP, Data Lake, Talend, Tableau, GitHub, SVN, CVS.
Sr. Hadoop Developer
Confidential, Orlando, FL
- Worked on a live 24 node Hadoop cluster running on HDP 2.2.
- Importing and exporting data job's, to perform operations like copying data from HDFS and to HDFS using Sqoop.
- Worked with Sqoop (version 1.4.3) jobs with incremental load to populate HAWQ External tables to Internal table.
- Created external and internal tables using HAWQ.
- Worked with Spark core, Spark Streaming and spark SQL modules of Spark.
- Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
- Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
- Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
- Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters on CentOS.
- Assisted with performance tuning, monitoring and troubleshooting.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experience in manipulating the streaming data to clusters through Kafka and Spark-Streaming.
- Optimized Hive QL/ pig scripts by using execution engine like TEZ, Spark.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Experience in reviewing Hadoop log files to detect failures.
- Hands on experience in fixing the production Job failures.
- Worked on RCA documentation. In case of any production failures.
- Performed benchmarking of the No-SQL databases, Cassandra and HBase streams.
- Worked with Pig, HBase, NoSQL database HBASE and Sqoop, for analyzing the Hadoop cluster as well as big data.
- Knowledge of workflow/schedulers like Oozie/crontab/Autosys.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Creating Hive tables and working on them for data analysis to meet the business requirements.
- Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
- Experience in using Sequence files, RCFile, AVRO and HAR file formats.
- Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
- Used FLUME to dump the application server logs into HDFS.
- Automating backups by shell for Linux to transfer data in S3 bucket.
- Worked with data modeling teams. Created conceptual, logical & physical models.
- Hands on experience working as production support Engineer.
- Hands on experience using HP ALM. Created test cases and uploaded into HP ALM.
- Experience in using Sequence files, RCFile, AVRO and HAR file formats.
- Automated incremental loads to load data into production cluster.
Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, HAWQ, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, HP ALM.
Confidential, Milwaukee, WI
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Experienced to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
- Developed Map Reduce programs for some refined queries on big data.
- Experienced in working with Elastic MapReduce (EMR).
- Creating Hive tables and working on them for data analysis to cope up with the requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Worked with business team in creating Hive queries for ad hoc access.
- In depth understanding of Classic MapReduce and YARN architectures.
- Implemented Hive Generic UDF's to implement business logic.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig for ETL jobs.
- Developed Pig UDF's to pre-process the data for analysis.
- Deployed Cloudera Hadoop Cluster on AWS for Big Data Analytics.
- Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Apache NIFI to copy the data from local file system to HDFS.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Involved in continuous monitoring of operations using Storm.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented indexing for logs from Oozie to Elastic Search.
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend.
Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Apache Kafka, Apache Storm, Oozie, SQL, Flume, Spark, HBase, Cassandra, Informatica, Java, GitHub.
Confidential, Bloomington, IL
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Importing and exporting data into HDFS from Oracle database and vice versa using Sqoop.
- Experience in installing, configuring Hadoop cluster for major Hadoop distributions.
- Experience in using Hive and Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
- Created partitions, bucketing across state in Hive to handle structured data.
- Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
- Implemented business logic based on state in Hive using Generic UDF's.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
- Experienced in writing the Map Reduce programs for analyzing of data as per the business requirements.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume & SPLUNK and process the files by using Piggybank.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Implemented MapReduce jobs to write data into Avro format.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating hive queries.
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MongoDB.
- Involved in developing Hive UDFs and reused in some other requirements.
- Worked on performing Join operations.
- Used Hadoop's Pig, Hive and Map Reduce for analyzing the Health insurance data and transforming into data sets of meaningful information such as medicines, diseases, symptoms, opinions, geographic region detail etc.
- Worked on data analytics using Pig and Hive on Hadoop.
- Evaluated Oozie for workflow orchestration in the automation of MapReduce jobs, Pig and Hive jobs.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Experienced in writing Pig scripts and Pig UDFs to pre-process the data for analysis.
- Experienced in managing and reviewing Hadoop log files.
- Reporting Expertise through Talend.
- Used JSTL and built custom tags whenever necessary.
- Used Expression Language to tie beans to UI components.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Hive, Pig, MapReduce, AVRO, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Python, SQL, Hadoop 1.x, HDFS, Talend, Pig, Hive, HBase, Github, MapReduce, Java, Sqoop, Flume, Splunk, Oozie, Linux, UNIX Shell & Python Scripting.
- Designed and developed the server-side layer using XML, JSP, JDBC, JNDI, EJB and DAO patterns using eclipse IDE.
- Developed java beans and JSP's by using spring and JSTL tag libs for supplements.
- Development of EJB's, Servlets and JSP files for implementing Business rules and Security options using IBM Web Sphere.
- Developed and tested the Efficiency Management module using EJB, Servlets, and JSP & Core Java components in WebLogic Application Server.
- Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
- Implemented Hibernate in the data access object layer to access and update information in the Oracle Database.
- Configured the deployment descriptors in Hibernate to achieve object relational mapping.
- Involved in developing Stored Procedures, Queries and Functions.
- Write SQL queries to pull some information from the Backend.
- Designed and implemented the architecture for the project using OOAD, UML design patterns.
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL Server, Oracle and DB2.
- Participated in requirement gathering and converting the requirements into technical specifications.
Environment: Java, JSF Framework, Eclipse IDE, Ajax, Apache Axis, OOAD, Web Logic, Java script, HTML, XML, CSS, SQL Server, Oracle, Web services, Ajax, Spring, OOAD and UML, Windows.
- Developed the user interface screens using swing for accepting various system inputs such as contractual terms, monthly data pertaining to production, inventory and transportation.
- Involved in designing database connections using JDBC.
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using sqlsever2000, database modification using SQL, Pl/ SQL, triggers views in oracle.
- Used dispatch action to group related actions into a single class.
- Build the applications using Ant tool, also used eclipse as the IDE.
- Developed the business components used for the calculation module.
- Involved in the logical and physical database design and implemented it by creating suitable tables, views and triggers.
- Applied J2EE design patterns like business delegate, DAO and singleton.
- Created the related procedures and functions used by JDBC calls in the above requirements.
- Actively involved in testing, debugging and deployment of the application on WebLogic application server.
- Developed test cases and performed unit testing using JUnit.
- Involved in fixing bugs and minor enhancements for the front-end modules.
Environment: Java, HTML, Java script, CSS, Oracle, JDBC, J2EE, DAO, ANT tool, SQL, Swing and Eclipse.