- 7+ years of overall software development experience on Big Data Technologies, Hadoop Eco system and Java/J2EE Technologies with experience programming in Java, Scala, Python and SQL.
- 4+ years of strong hands - on experience on Hadoop Ecosystem including Spark, Map-Reduce, HIVE, Pig, HDFS, YARN, HBase, Oozie, Kafka, Sqoop, Flume.
- Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
- Worked with real-time data processing and streaming techniques using Spark streaming, Storm and Kafka.
- Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
- Significant experience writing custom UDF’s in Hive and custom Input Formats in MapReduce.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLIib.
- Good knowledge on Data Warehousing, ETL development, Distributed Computing, and largescale data processing.
- Good knowledge on INFORMATICA for ETL tool, and stored procedures to pull data from source systems/ files, cleanse, transform and load data into databases.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache SPARK, Cloudera and AWS Service console.
- Strong experience productionalizing end to end data pipelines on Hadoop platform.
- Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform using Kerberos .
- Experience in architecting, designing, and building distributed software systems.
- Extensively worked on UNIX shell scripts to do the batch processing.
- Experience in using various Hadoop Distributions like Cloudera, Hortonworks and Amazon EMR.
- Experience in developing service components using JDBC.
- Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON and Avro .
- Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DML SQL queries.
- Good knowledge of No-SQL databases Cassandra, MongoDB and HBase .
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Experience in developing applications using waterfall and Agile ( XP and Scrum ).
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase, Impala, Sqoop, Flume, NoSQL (HBase, Cassandra),Spark, Kafka, Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS, Hortonwork clusters
Languages: C, C++, Core Java, Shell Scripting, PL/SQL, Python, Pig Latin
Operating systems: Windows, Linux and Unix
DBMS / RDBMS: Oracle, Talend ETL, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, MongoDB, Cassandra, HBase
IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence
Version Control: Git, SVN, CVS
Web Services: RESTful, SOAP
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Senior Hadoop Spark Developer
- Involved in writing Java Map Reduce.
- Written the Apache PIG scripts to process the HDFS data.
- Created HIVE tables to store the processed results in a tabular format.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
- Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
- Pulled Excel data into HDFS.
- Have an experience to load and transform large sets of structured, semi structured and unstructured data, using SCOOP from Hadoop Distributed File Systems to Relational Database Systems and also Relational Database Systems to Hadoop Distributed File Systems.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
- Developed Spark Scripts by using Scala shell commands as per the requirement.
- Developed hive queries and UDF.
- Developed ETL workflow which pushes webserver logs to an Amazon S3 bucket.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Designed the ETL process and created the High-level design document including the logical data flows, source data extraction process, the database staging, job scheduling and Error Handling.
- Done Schema Validation using MapReduce in Java.
- Writing the script files for processing data and loading to HDFS.
- Created External Hive Table on top of parsed data.
- Moved all log/text files generated by various products into HDFS location.
- Active involvement in Scrum meetings and Followed Agile Methodology for implementation.
Environment: Linux/UNIX, CentOS, Hadoop 2.4.x, OOZIE, HIVE0.13, SQOOP, FLUME, Kafka, Cassandra, Spark Hortonworks2.1.1, AWS, Tableau, AVRO.
Senior Hadoop Developer
Confidential, Houston, TX
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Written MapReduce code to process & parsing the data from various sources & storing parsed data into HBase and Hive using HBase-Hive Integration.
- Involved in developing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
- Worked on moving all log files generated from various sources to HDFS for further processing.
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop .
- Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
- Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE .
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real time data analyses.
- Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
- Developed Map Reduce program for parsing and loading into HDFS information.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying . Written Hive UDF to sort Structure fields and return complex data type.
- Responsible for loading data from UNIX file system to HDFS .
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
- Developed workflow in Control M to automate tasks of loading data into HDFS and pre-processing with PIG .
- Cluster co-ordination services through ZooKeeper
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster .
- Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
Environment : Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, Aws, Spark, Unix, Tableau, Cosmos.
Confidential, Rego Park, NY
- Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.
- Developed and executed custom MapReduce programs, PigLatin scripts and HQL queries.
- Used Hadoop FS scripts for HDFS (Hadoop File System) data loading and manipulation.
- Performed Hive test queries on local sample files and HDFS files.
- Developed and optimized Pig and Hive UDFs (User-Defined Functions) to implement the functionality of external languages as and when required.
- Extensively used Pig for data cleaning and optimization.
- Developed Hive queries to analyze data and generate results.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Worked on reading multiple data formats on HDFS using Scala
- Managed, reviewed and interpreted Hadoop log files.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL
- Worked on SOLR for indexing and search optimization.
- Analyzed business requirements and cross-verified them with functionality & features of NOSQL databases like HBase, Cassandra to determine the optimal DB.
- Analyzed user request patterns and implemented various performance optimization measures including but not limited to implementing partitions and buckets in HiveQL.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Monitored workload, job performance and node health using Cloudera Manager.
- Used Flume to collect and aggregate weblog data from different sources and pushed to HDFS.
- Integrated Oozie with Map-Reduce, Pig, Hive, and Sqoop.
Environment: Hadoop 1x, HDFS, MapReduce, Pig 0.11, Scala, Spark, Hive 0.10, Crystal Reports, Sqoop, HBase, Shell Scripting, UNIX
Confidential, Longwood, FL
- Involved in writing Java MapReduce.
- Converted delimited data and XML data to common format (JSON) using java MapReduce.
- We stored data in compress mechanism like Apache Avro.
- Involved in creating Hive tables, loaded and analyzed data using Hive queries.
- Worked extensively with SCOOP for importing metadata from Oracle.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Worked with completely structured data of size in TB.
- Using SCOOP pulled data from different relational databases to Hive tables and HDFS.
- Created AVRO schemas for these data.
- Created Partitions for these data, these helps quick results from large hive tables.
- Created tables and views for different Customers according to their permissions.
- Performed partitioning and bucketing of hive tables to store data on Hadoop.
- Involved in loading data from UNIX file system to HDFS.
- Integrated HBase with Map Reduce to move bulk amount of data into HBase.
- Creating external tables using hive and providing to the downstream data.
- Used Zookeeper operational services for coordinating cluster and scheduling workflows.
- Create ETL transforms and jobs to move data from files to our operational database and from operational database to our data warehouse.
- Exporting the results of transaction and sales data to RDBMS after aggregations and computations using SCOOP.
Environment : Linux/UNIX, UBUNTU, Hadoop 2.0.3, OOZIE, PIG, HIVE, SQOOP, ZOOKEEPER, HBASE, FLUME.
- Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
- Implemented various J2EE Design Patterns such as Model-View-Controller, Data Access Object, Business Delegate and Transfer Object.
- Responsible for analysis & design of the application based on MVC Architecture, using open source Struts Framework.
- Involved in configuring Struts, Tiles and developing the configuration files.
- Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
- Used Spring Framework and integrated it with Struts .
- Involved in Configuring web.xml and struts-config.xml according to the struts framework.
- Designed a light weight model for the product using Inversion of Control principle and implemented it successfully using Spring IOC Container.
- Used transaction interceptor provided by Spring for declarative Transaction Management .
- The dependencies between the classes were managed by Spring using the Dependency Injection to promote loose coupling between them .
- Provided connections using JDBC to the database and developed SQL queries to manipulate the data.
- Wrote stored procedure and used JAVA APIs to call these procedures.
- Developed various test cases such as unit tests, mock tests, and integration tests using the JUNIT .
- Experience writing Stored Procedures, Functions and Packages.
- Gathered specifications from the requirements.
- Developed the application using Struts MVC 2 architecture.
- Developed JSP custom tags and Struts tags to support custom User Interfaces.
- Developed front-end pages using JSP, HTML and CSS
- Developed core Java classes for utility classes, business logic, and test cases
- Developed SQL queries using MySQL and established connectivity
- Used Stored Procedures for performing different database operations
- Used JDBC for interacting with Database
- Developed servlets for processing the request
- Used Exception Handling for handling exceptions
- Designed sequence diagrams and use case diagrams for proper implementation
- Used Rational Rose for design and implementation