- Around 8+ years of professional IT industry experience encompassing wide range of skill set in Big Data technologies and Java/J2EE technologies.
- 4+ years of experience in working wif Big Data Technologies on systems which comprises of massive amount of data running in highly distributive mode in Cloudera, HortonworksHadoop distributions.
- Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Kafka, Flume, MapReduce and Yarn.
- Strong Knowledge on architecture and components of Spark, and efficient in working wif Spark Core, SparkSQL, Spark streaming.
- Implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used pyspark and spark - shell accordingly.
- Experience in configuringSpark Streaming to receive real time data from teh Apache Kafka and store teh stream data to HDFS using Scala.
- Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
- Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
- Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Used Spark Data Frame Operations to perform required Validations in teh data.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Good understanding and knowledge of NoSQL databases like Confidential, Hbase and Cassandra.
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
- Experienced in designing different time driven and data driven automated workflows using Oozie.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
- Worked on developing ETL Workflows on teh data obtained using Python for processing it in HDFS and HBase using Oozie.
- Experience in configuring teh Zookeeper to coordinate teh servers in clusters and to maintain teh data consistency.
- Experienced in working wif Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.
- Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
- Good knowledge in using apache NiFi to automate teh data movement between different Hadoop systems.
- Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing teh data onto HDFS.
- Experience in importing and exporting teh data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
- Experience in relational databases like Oracle, MySQL and SQL Server.
- Experienced in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, Spring Tool Suite.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like GIT, SVN.
- Experience in Java development GUI using JFC, Swing, JavaBeans, and AWT.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
- Experienced in working in SDLC, Agile and Waterfall Methodologies.
- Excellent Communication skills, Interpersonal skills, problem solving skills and a team player.Ability to quickly adapt new environment and technologies.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, Confidential, Cassandra, Avro, Storm, Parquet and Snappy.
Languages: Java, Python, Scala
Java Technologies: Servlets, JavaBeans, JSP, JDBC, and Spring MVC
No SQL Databases: Cassandra, Confidential and HBase
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J
ETL Tools: Talend, Informatica
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac os and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Talend, Informatica, Pentaho
Confidential, Herndon, VA
Sr. Hadoop Developer
- Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
- Exploring wif teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD’s, Spark YARN.
- Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
- Experienced wif batch processing of data sources using Apache Spark and Elastic search.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
- Experienced to implement Hortonworks distribution system.
- Creating Hive tables and working on them for data analysis to cope up wif teh requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Used Spark Data Frames Operations to perform required Validations in teh data and to perform analytics on teh Hive data.
- Experienced in working wif Elastic MapReduce(EMR).
- Developed Map Reduce programs for some refined queries on big data.
- In-depth understanding of classic MapReduce and YARN architecture.
- Worked wif business team in creating Hive queried for ad hoc access.
- Use Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
- Implemented Hive Generic UDF’s to implement business logic.
- Analyzed teh data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
- Installed and configured Pig for ETL jobs.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per teh software requirement specifications.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and tan exported teh transformed data to Cassandra as per teh business requirement.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into teh Hadoop Distributed File System and Pig to pre-process teh data.
- Created detailed AWS Security groups which behaved as virtual firewalls dat controlled teh traffic allowed reaching one or more AWS EC2 instances.
- Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
- Performed data integration wif a goal of moving more data effectively, efficiently and wif high performance to assist in business-critical projects using Talend Data Integration.
- Design, developed, unit test, and support ETL mapping and scripts for data marts using Talend.
- Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
- Built a data flow pipeline using flume, Java (MapReduce) and Pig.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
- Experience in using version control tools like GITHUB to share teh code snippet among teh team members.
Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Solr, Impala.
Confidential, Green, OH
- Good in implementing advanced procedures like text analytics and processing using teh in-memory computing capabilities like Apache Spark written in Scala.
- Used Spark for interactive queries, processing of streaming data and integration wif popular NoSQL database for huge volume of data.
- Developed Spark scripts by using Scala shell commands as per teh requirement.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using teh Spark framework.
- Used Spark RDD for faster Data sharing.
- Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
- Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in Confidential .
- Extracted and restructured teh data into Confidential using import and export command line utility tool.
- Worked on teh large-scale Hadoop YARN cluster for distributed data processing and analysis using Hive and Confidential .
- Wrote XML scripts to build Oozie functionality.
- Experience in workflow Scheduler Oozie to manage and schedule job on Hadoop cluster for generating reports on Day and weekly basis.
- Used Flume to collect, aggregate, and store teh web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented custom serializer, interceptor, source and sink in Flume to ingest data from multiple sources.
- Involved in writing query using Impala for better and faster processing of data.
- Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
- Involved in moving log files generated from various sources to HDFS for further processing through Flume.
- Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Worked on partitioning teh HIVE table and running teh scripts in parallel to reduce teh run time of teh scripts.
- Analyzed teh data by performing Hive queries and running Pig scripts to know user behavior.
- Programmed pig scripts wif complex joins like replicated and skewed to achieve better performance.
- Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
- Designing & creating ETL jobs through Talend to load huge volumes of data into Confidential, Hadoop Ecosystem and relational databases.
- Created Talend jobs to connect to Quality Stage using FTP connection and process data received from Quality Stage.
- Migrated data from MySQL server to Hadoop using Sqoop for processing data.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
- Experienced in developing Shell scripts and Python scripts for system management.
- Worked wif application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked wif SCRUM team in delivering agreed user stories on time for every Sprint.
Environment: CDH 3.x and 4.x, Java, Hadoop, Python, Map Reduce, Hive, Pig, Impala, Flume, Confidential, Sqoop, Talend, Spark, MySQL,AWS.
Confidential - San Francisco, CA
- Worked on migrating MapReduce programs into Spark transformations using Scala. Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for cluster maintenance by adding and removing cluster nodes. Cluster monitoring, troubleshooting, managing and reviewing data backups and log files.
- Wrote complex MapReduce jobs in Java to perform operations by extracting, transforming and aggregating to process terabytes of data.
- Collected and aggregated large amounts of stream data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Analyzed data using Hadoop components Hive and Pig.
- Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work wif sequence files.
- Scheduled workflows using Oozie to automate multiple Hive and Pig jobs, which run independently wif time and data availability.
- Responsible for creating Hive tables, loading data and writing Hive queries to analyze data.
- Generated reports using QlikView.
- Wrote several Hive queries to get valuable information from teh hidden large datasets.
- Loaded and transformed large sets of structured, semi-structured and unstructured data using Hadoop/Big Data concepts.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Imported data from Teradatadatabase into HDFS and exported teh analyzed patterns data back to Teradata using Sqoop.
- Worked wif Talend Open Studio to perform ETL jobs.
Environment: Hadoop(Hortonworks), HDFS, Hive, Pig, Sqoop, Map Reduce, HBase, Shell Scripting, QlikView, Teradata 14, Oozie, Java 7, Maven 3.x.
Confidential - Bethesda, MD
- Involved in Installation and configuration of JDK, Hadoop, Pig, Sqoop, Hive, HBase on Linux environment. Assisted wif performance tuning and monitoring.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Created reports for teh BI team using Sqoop to export data into HDFS and Hive.
- Worked on creating MapReduceprograms to parse teh data for claim report generation and running teh Jars in Hadoop. Co-ordinated wif Java team in creating MapReduce programs.
- Worked on creating Pig scripts for most modules to give a comparison effort estimation on code development.
- Collaborated wif BI teams to ensure data quality and availability wif live visualization.
- Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables.
- Created HBase tables to store variable data formats coming from different portfolios Performed real-time analytics on HBase using Java API and Rest API.
- Performed test run of teh module components to understand teh productivity.
- Written Java program to retrieve data from HDFS and providing REST services.
- Shared responsibility and assistance for administration of Hadoop, Hive, Sqoop, HBase and Pig in team.
- Shared teh knowledge of Hadoop concepts wif team members.
- Used JUnit for unit testing and Continuum for integration testing.
Environment: Cloudera, Hadoop, Pig, Sqoop, Hive, HBase, Java, Eclipse, MySQL, MapReduce.
- Responsible for teh analyzing, documenting teh requirements, designing and developing teh application based on J2EE standards. Strictly Followed Test Driven Development.
- Used Microsoft Visio for designing use cases like Class Diagrams, Sequence Diagrams, and Data Models.
- Designed Rich Internet Application by implementing jQuery based accordion styles.
- Used Spring MVC and Dependency Injection for handling presentation and business logic. Integrated Spring DAO for data access using Hibernate.
- Developed Struts web forms and actions for validation of user request data and application functionality.
- Developed programs for accessing teh database using JDBC thin driver to execute queries, prepared statements, Stored Procedures and to manipulate teh data in teh database.
- Created tile definitions, Struts configuration files, validation files and resource bundles for all modules using Struts framework.
- Involved in teh coding and integration of several business-critical modules using Java, JSF,and Hibernate.
- Developed SOAP-based web services for communication between its upstream applications.
- Implemented different Design patterns like DAO, Singleton Pattern and MVC architectural design pattern of spring.
- Implemented Service Oriented Architecture (SOA) on Enterprise Service Bus (ESB).
- Developed Message-Driven Beans for asynchronous processing of alerts using JMS.
- Implemented Rational Rose tool for application development.
- Used Clear case for source code control and JUnit for unit testing.
- Performed integration testing of teh modules.
- Used putty for UNIX login to run teh batch jobs and check server logs.
- Deployed application on to Glassfish Server.
- Involved in peer code reviews.
Environment: Java 6,7, J2EE, Struts 2, Glassfish, JSP, JDBC, EJB, ANT, XML, IBM Web Sphere, JUnit, IBM DB2, Rational Rose 7, CVS, UNIX, SOAP, SQL, PL/SQL.
- Documented functional and technical requirements, wrote Technical Design Documents.
- Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
- Developed presentation layer components comprising of JSP, AJAX, Servlets and JavaBeans using teh Struts framework.
- Implemented MVC (Model View Controller) architecture.
- Developed XML configuration and data description using Hibernate.
- Developed Web services usingCXF to interact wif Mainframe applications.
- Responsible for teh deployment of teh application in teh development environment using BEA WebLogic 9.0 application server.
- Participated in teh configuration of BEA WebLogic application server.
- Designed and developed front end user interface using HTML and Java Server Pages (JSP) for customer profile setup.
- Developed ANT Script to compile teh Java files and to build teh jars and wars.
- Responsible for Analysis, Coding and Unit Testing and Production Support.
- Used JUnit for testing Modules.