- Professional Software developer with 8+ years of technical expertise in all phases of software development cycle, Bigdata Frame works and Java/J2EE technologies.
- 4+ years of industrial experience in Data manipulation, Big Data analytics using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, AWS, Cassandra, Avro, Solr and Zookeeper.
- Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS) .
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
- Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
- Hands on experience in developing SPARK applications using Spark API’s like Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Strong experience and knowledge of real time data analytics using Spark, Kafka and Flume.
- Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce(EMR) on (EC2).
- Expertise in developing Pig Latin scripts and using Hive Query Language.
- Developed Customized UDFs and UDAF’s in java to extend HIVE and Pig core functionality.
- Created Hive tables to store structured data into HDFS and processed it using HiveQL.
- Worked on GUI Based Hive Interaction tools like Hue, Karmasphere for querying the data.
- Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
- Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
- Done Clustering, regression and Classification using Machine learning libraries Mahout, MLlib(Spark).
- Good experience with use-case development, with Software methodologies like Agile and Waterfall.
- Working knowledge in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
- Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data warehouses.
- Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE.
- Good knowledge on build tools like Maven, Graddle and Ant.
- Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache and Amazon EMR Hadoop distributions.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews.
- Hands-on knowledge in core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
No SQL Databases: Cassandra, MongoDB, HBase
Java Technologies: JSE, Servlets, JavaBeans, JSP, JDBC, JNDI, AJAX, EJB and struts.
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development/Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, Oracle
RDBMS: Teradata, Oracle 10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, Red Hat LINUX, Mac os and Windows Variants
Testing: UNIT Testing, Hive Testing, Quality Center (QC)
ETL Tools: Talend, Informatica
Confidential Chadds Ford, PA
Big Data Hadoop Developer
- Responsible for developing prototypes the selected solutions and implementing complex big data projects with a focus on collecting, parsing, managing, analyzing and visualizing large sets of data using multiple platforms.
- Understanding how to apply technologies to solve big data problems and to develop innovative big data solutions.
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Performed importing data from various sources to the Cassandra cluster using Sqoop. Worked on creating data models for Cassandra from Existing Oracle data model.
- Used Spark - Cassandra connector to load data to and from Cassandra.
- Currently working in Spark and Scala for Data Analytics. Handle ETL Framework in Spark for writing data from HDFS to Hive.
- Use Scala based written framework for ETL.
- Developed multiple spark streaming and core jobs with Kafka as a data pipe-line system
- Extensively use Zookeeper as job scheduler for Spark Jobs.
- Worked on Talend with Hadoop. Worked in migrating from i nformatica Talend jobs.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper .
- Experience on Kafka and Spark integration for real time data processing.
- Developed Kafka producer and consumer components for real time data processing.
- Hands-on experience for setting up Kafka mirror maker for data replication across the cluster’s.
- Experience in Configure, Design, Implement and monitor Kafka Cluster and connector.
- Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
- Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
- Involved in Cassandra Data modeling to create key spaces and tables in multi Data Center DSE Cassandra DB.
Environment: Spark, HDFS, Kafka, MapReduce (MR1), Pig, Hive, Sqoop, Cassandra, AWS, Talend, Java, Linux Shell Scripting.
- Written Hive queries for data analysis to meet the business requirements
- Load and transform large sets of structured, semi structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Migrated data between RDBMS and HDFS/Hive with Sqoop.
- Hands on using Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Used Sqoop to import and export data among HDFS, MySQL database and Hive
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Involved in loading data from UNIX/LINUX file system to HDFS.
- Analyzed the data by performing Hive queries and running Pig scripts.
- Worked on implementing Spark Framework a Java based Web Frame work.
- Designed and implemented Spark jobs to support distributed data processing.
- Worked on Spark Code using Scala and Spark SQL for faster data sets processing and testing.
- Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis
- Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
- Extracted and restructured the data into MongoDB using import and export command line utility tool.
- Managed and reviewed Hadoop and HBase log files. Worked on HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Performed data analysis with HBase using Hive External tables. Exported the analyzed data to HBase using Sqoop and to generate reports for the BI team.
- Importing the data from relational database to Hadoop cluster by using Sqoop.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Experience in working with different join patterns and implemented both Map side and Reduce Side Joins.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Imported logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory loading the data from local system to HDFS.
- Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
- Created Partitioned Hive tables and worked on them using HiveQL.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to Elastic search.
- Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Worked in Agile development environment having KANBAN methodology. Actively involved in daily scrum and other design related meetings.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Continuous coordination with QA team, production support team and deployment team.
Environment: Mapreduce, PIG Latin, Hive, Apache Crunch, Spark, Scala, HDFS, HBase, Solr, Core Java,J2EE, Eclipse, AVRO, Parquet, Sqoop, Impala, HUE, Flume, Oozie, Tableau, MongoDB, Jenkins, Agile Scrum methodology.
Confidential, Bay Area, CA
- Experienced in migrating and transforming of large sets of Structured, semi structured and Unstructured data from HBase through Sqoop and placed in HDFS for further processing.
- Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other codec file formats
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
- Installed, configured and maintained Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Created multiple Hive tables, running hive queries in those data, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
- Experienced in running batch processes using Pig Latin Scripts and developed Pig UDFs for data manipulation according to Business Requirements.
- Hands on experience in Developing optimal strategies for distributing the web log data over the cluster, importing and exporting of stored web log data into HDFS and Hive using Scoop.
- Developed Unit test cases for Hadoop M-R jobs and driver classes with MR Testing library.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Performed Cluster tasks like adding, removing of nodes without any effect on running jobs.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
Environment: Apache Hadoop, MapReduce, HDFS, HBase, CentOS 6.4, REST web Services, Elastic Search, Hive, Pig, Oozie, Java (jdk 1.6), JSON, Eclipse, Sqoop.
- Involved in coding Servlets on the server side, which gets the requests from the client and processes the same by interacting the Oracle database.
- Coded Java Servlets to control and maintain the session state and handle user requests
- Used JDBC to connect to the backend database and developed stored procedures.
- Developed code to handle web requests involving Request Handlers, Business Objects, and Data Access Objects.
- Creation of JSP pages including the use of JSP custom tags and other methods of Java Beam presentation and all HTML and graphically oriented aspects of the site's user interface.
- Used XML for mapping the pages and classes and to transfer data universally among different data sources.
- Involved in unit testing and documentation.
- Wrote Servlets and JSPs to generate UI for an internal application.
- Developed user interface using Java Server Faces UI component framework.
- Developed POJOs and Java beans to implement business logic.
- Managed data to and from the database using JDBC connections.
- Used Spring JDBC to write some DAO classes to interact with the database to access account information.
- Involved in creation of tables and indexes and wrote complex SQL queries.
- Used Git as version control system to manage the progress of the project.
- Used Junit framework for unit testing of the application.
- Handled requirements and worked in an agile process.