- 7+ years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and BigData applications.
- Over 3+ years of experience in BigData platform as both Developer and Administrator.
- Hands on experience in developing and deploying enterprise - based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, Pig, HBase, Flume, Sqoop, Spark Streaming, SparkSQL, Storm, Kafka, Oozie and Cassandra.
- Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
- Exposure to administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig
- Installed and configured multiple Hadoop clusters of different sizes and with ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
- Worked on all major distributions of Hadoop Cloudera and Hortonworks.
- Responsible for designing and building a Data Lake using Hadoop and its ecosystem components.
- Handled Data Movement, data transformation, Analysis and visualization across the lake by integrating it with various tools.
- Defined extract-translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
- Good Expertise in Planning, Installing and Configuring HadoopCluster based on the business needs.
- Good experience in working with cloud environment like Amazon Web Services (AWS)EC2 and S3
- Transformed and aggregated data for analysis by implementing workflow management of Sqoop, Hive and Pig scripts.
- Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, and Snappy in Hadoop.
- Experience in retrieving data from databases like MYSQL, Teradata, Informix, DB2 and Oracle into HDFS using Sqoop and ingesting them into HBase and Cassandra.
- Experience writing Oozie workflows and Job Controllers for job automation.
- Integrated Oozie with Hue and scheduled workflows for multiple Hive, Pig and Spark Jobs.
- In-Depth knowledge of Scala and Experience building Spark applications using Scala.
- Good experience working on Tableau and Spotfire and enabled the JDBC/ODBCdata connectivity from those to Hive tables.
- Designed neat and insightful dashboards in Tableau.
- Have worked and designed on array of reports which includes Crosstab, Chart, Drill-Down, Drill-Through, Customer-Segment, and Geodemographic segmentation.
- Deep understanding of Tableau features such as site and server administration, Calculated fields, Table calculations, Parameters, Filter's (Normal and quick), highlighting, Level of detail, Granularity, Aggregation, line and many more.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
- Designed and developed multiple J2EEModel 2 MVC based Web Application using J2EE.
- Worked on various Tools and IDEs like Eclipse, IBMRational, Apache Ant-Build Tool, MS-Office, PL/SQL Developer and SQLPlus.
- Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.
Big Data Technologies: Hadoop, HDFS, Hive, Map Reduce, Pig, HDFS, Hive, Map Reduce, Pig distribution, and H Base, Spark
Programming Languages: Java (5, 6, 7), Python, Scala, C/C++, XML Shell scripting, COBOL
MySQL, SQL/PLSQL, MS: SQL Server 2005, Oracle
ETL Tools: Cassandra, HBASE, ELASTIC SEARCH, Alteryx.
Operating Systems: Linux, Windows XP/7/8
Software Life Cycles: SDLC, Waterfall and Agile models
MSOffice, MS: Project and Risk Analysis tools, Visio
Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SOAP UI, ANT, Maven, Automation and MR-Unit
Cloud Platforms: Amazon (EC2, EMR, S3)
Version Control: CVS, Tortoise SVN
Visualization Tools: Tableau.
Servers IBM: WebSphere, WebLogic, Tomcat, and Red hat Satellite Server
Sr. Hadoop Developer
Confidential, Atlanta, GA
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BIteam.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.
- Worked with Puppet for application deployment.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Created HBase tables to store various data formats of data coming from different sources.
- Use Maven to build and deploy code in Yarn cluster.
- Good knowledge on building Apache spark applications using Scala.
- Developed several business services using Java RESTful WebServices using SpringMVC framework.
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-Kafka.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Implemented test scripts to support test driven development and continuous integration.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
- Responsible to manage data coming from different sources.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Used File System check (FSCK) to check the health of files in HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
- Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS).
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Create a complete processing engine, based on Cloudera’s distribution.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
- Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Configured Kerberos for the clusters.
Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub.
Sr. Hadoop Developer
Confidential, Fort Lauderdale, FL
- Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications TALEND ETL to Hadoop.
- Installed and configured Hive, Pig and Sqoop on the HDP 2.0 cluster.
- Performed real time analytics on HBase using Java API and Fetched data to/from HBase by writing Map Reduce job.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and Processing using HDP 2.0
- Wrote SQL queries to process the data using Spark SQL. Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and making the data available.
- Extracted data from different databases and to copy into HDFS file system using Sqoop.
- Created Talend Mappings to populate the data into Staging, Dimension and Fact tables.
- Worked on project to retrieve log messages procured by leveraging Spark Streaming.
- Designed Oozie jobs for the auto processing of similar data. Collect the data using Spark Streaming.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs. Used Scala collection framework to store and process the complex consumer information. Used Scala functional programming concepts to develop business logic.
- Developed Pig scripts in the areas where extensive coding needs to be reduced.
- An in depth understanding of Scala programming language along with lift framework. Generating Scala and java classes from the respective APIs so that they can be incorporated in the overall application.
- Worked with Spark Streaming to ingest data into spark engine. Extensively used for all and bulk collect to fetch large volumes of data from table.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Worked on running reports in Linux environment. Worked on writing shell scripts to reports in Linux environment. Used Linux to manage files.
- Parsed high-level design specification to simple ETL coding and mapping standards.
- Developed complex Talend jobs mappings to load the data from various sources using different components. Design, develop and implement solutions using Talend Integration Suite.
- Imported the data from different sources like Talend ETL, Local file system into Spark RDD. Experience with developing and maintaining Applications written for Elastic, Map Reduce.
- Responsible to manage data coming from sources (RDBMS) and involved in HDFS maintenance and loading of structured data.
- Optimized several Map Reduce algorithms in Java according to the client requirement for big data analytics.
- Responsible for importing data from MySQL to HDFS and provide the query capabilities using HIVE.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Developed the Sqoop scripts to make the interaction between Pig and MySQL Database.
- Involved in writing shell scripts in scheduling and automation of tasks.
Environment: Hadoop, Talend, Map Reducer, HDFS, Jenkins, Hive, Pig, Spark, Storm, Kafka, Flume, Sqoop, Oozie, SQL, Scala, Java, Hadoop and Eclipse.
Confidential, Tampa, FL
- Worked on a live 90 nodes Hadoop cluster running CDH4.1
- Worked with highly unstructured and semi structured data of 120 TB in size (360 TB)
- Developed hive queries on data logs to perform a trend analysis of user behavior on various online modules.
- Developed the Pig UDF'S to pre-process the data for analysis. Involved in the setup and deployment of Hadoop cluster.
- Developed Map Reduce programs for some refined queries on big data. Involved in loading data from UNIX file system to HDFS.
- Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups,
- Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop and generated reports for the BI team.
- Managing and scheduling jobs on a Hadoop cluster using Oozie.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline
- Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.6.3 API's to produce messages.
- Provided daily code contribution, worked in a test-driven development.
- Installed, Configured Talend ETL on single and multi-server environments.
- Developed Merge jobs in Python to extract and load data into MySQL database.
- Created and modified several UNIX shell Scripts according to the changing needs of the project and client requirements. Developed UNIX shell scripts to call Oracle PL/SQL packages and contributed to standard framework.
- Developed Simple to complex Map/reduce Jobs using Hive. Implemented Partitioning and bucketing in Hive.
- Mentored analyst and test team for writing Hive Queries. Involved in setting up of HBase to use HDFS.
- Extensively used Pig for data cleansing.
- Loaded streaming log data from various Webservers into HDFS using Flume .
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based
- Performed benchmarking of the No-SQL databases, Cassandra and Hbase streams.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Knowledgeable of Spark and Scala mainly in framework exploration for transition from Hadoop/MapReduce to Spark.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Configured Flume to extract the data from the web server output files to load into HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Environment: Unix Shell Scripting, Python, Oracle 11g, DB2, HDFS, Kafka, Storm, Spark, ETL, 1Java (jdk1.7), Pig, Linux, Cassandra, MapReduce, Ms Access, Toad, SQL, Scala, MySQL Workbench, XML, No-SQL, MapReduce, SOLR, HBase, Hive, Sqoop, Flume, Talend, Oozie.
- Involved in analysis, design and development of Expense Processing system.
- Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Object Diagrams to model the detail design of the application using UML.
- Installed, configuring, and administrating Hadoop cluster of major Hadoop distributions.
- Written MapReduce jobs in Java, Pig and Python.
- Extensively worked with workflow/schedulers like Oozie and Scripting using Unix Shell Script, Python, and Perl.
- Worked with SQL and NoSQL (MongoDB, Cassandra, Hadoop) data structures
- Managing and reviewing Hadoop log files
- Running Hadoop streaming jobs to process terabytes of xml format data
- Worked on Hadoop Cluster migrations or Upgrades
- Extensively worked with Cloudera Hadoop distribution components and custom packages
- Build Reporting using Tableau
- Applied ETL principles and best practices
- Developed user interface using JSP, HTML, CSS and Java Script to simplify the complexities of the application.
- Used AJAX Framework for Dynamic Searching of Bill Expense Information.
- Created dynamic end to end REST API with Loopback-Node JS Framework.
- Configured the spring framework for the entire business logic layer.
- Developed code using various patterns like Singleton, Front Controller, Adapter, DAO, MVC, Template, Builder and Factory Patterns
- Developed one-to-many, many-to-one, one-to-one annotation-based mappings in Hibernate.
- Developed DAO service methods to populate the domain model objects using Hibernate.
- Used Spring Frame Work’s Bean Factory for initializing services.
- Used Java collections API extensively such as List, Sets and Maps.
- Wrote DAO classes using spring and Hibernate to interact with database for persistence.
- Involved in Creation of tables, indexes, sequences, constraints and created stored procedures and triggers which were used to implement business rules.
- Installation of SQL Server on Development and Production Servers, setting up databases, users, roles and permissions.
- Extensively involved in SQL joins, sub queries, tracing and performance tuning for better running of queries
- Provided documentation about database/data warehouse structures and Updated functional specification and technical design documents.
- Designed and created different ETL packages using SSIS and transfer data from heterogeneous database different files format Oracle, SQL Server, and Flat File to SQL server destination.
- Worked on several transformations in Data Flow including Derived column, Slowly Changing Dimension Using SSIS Controls , Lookup , Fuzzy Lookup, Data Conversion, Conditional split and many more.
- Created various reports with drilldowns, drill through, calculated members, and drilldowns reports by using SQL Server Reporting Services
- Used various report items like tables, sub report and charts to develop the reports in SSRS and upload into Report Manager
- Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, SQL joins and other T- SQL code to implement business rules
- Used Performance Monitor and SQL Profiler to optimize queries and enhance the performance of database servers.
Environment: MS SQL Server 2012/2008R2/2008, T- SQL, SQL Server Reporting Services (SSRS), SSIS, SSAS, Business Intelligence Development Studio (BIDS), MS Excel, Visual Source Team Foundation Server, VB Script.