Hadoop Developer Resume
Portsmouth, NH
TECHNICAL SKILLS:
- HDFS Sqoop
- Flume
- LINUX
- Oozie
- Hadoop
- Pig
- Hive
- Hbase
- Cassandra
- Hadoop Cluster
- Amazon Web services
WORK EXPERIENCE:
Hadoop Developer
Confidential, Portsmouth, NH
Responsibilities:
- Gathering the requirements for the Near - real Time data replication into EDL.
- Responsible for the architectural designing of the near real time data replication frame work and testing it.
- Assisting with Palantir Analytics use cases for Molina Health care.
- Responsible for gathering the requirement for the daily and monthly incremental data lods.
- Providing Data integration, Data ingestion for the use cases required for the palantir analytics.
- Writing Mesa code in helping data transformation and data integration required for Palantir Analytic use cases.
- Successfully designed the daily and monthly incremental loads from sql databases to hdfs and No-sql data base such as Hbase.
- Successfully implemented partitioning in Hive and in creating Hbase snapshot tables.
- Responsible for setting up of the polybase in MsSql server and creating the respective tables into polybase.
- Data validation and Data Quality check frame work build were successfully done and is responsible for building the frameworks using Talend Big Data.
- Performance optimization check and error logging was in built in the framework to successfully test the ingestion framework.
- Responsible for the communication with different teams on inbound and outbound data requirements and cooking up the data as per the requirements.
- Also successfully implemented spark streaming which brings the real-time data to the HDFS.
- Successfully implemented Kafka-clustering and setting up the topics in Kafka for real time data ingestion.
- Involved in developing Scala script to implement Spark job in analyzing and validating the data ingested.
- Successfully implemented image upload and retrieval from Hbase to meet the SLA requirement in which traditional image retrievals for some of the applications.
- Involved in data ingestion and framework build for one of the predictive use case which is Cardio-Vascular disease prediction.
- Successful in implementing Apache zeppelin with hive, sql and spark for building the charts for CVD (Cardio vascular Disease)
- Build end-end solution to store unstructured data like images, pdf's into Hadoop and Hbase and render the data back to different web applications Using REST and TRIFT API's.
- Used Native Hadoop API 'webHDFS' to expose data residing in HDFS to various Web Applications.
- Integrated Apache Storm with Kafka to perform text analytics on Image metadata. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Used HIPI an image processing library designed to be used with the Apache Hadoop MapReduce parallel programming framework to efficiently process the claim related images with MapReduce style parallel programming.
- Providing L1 and L2 support activities for all Palantir use cases business users across all the state.
- Data validation and quality check for data consume by Palantir use cases.
- Worked on PowerBI Dashboard to make some changes on behalf of Business requirements
- Experience in creating PowerBI Dashboards (Power View, Power Query, Power Pivot, Power Maps)
- Worked on ELK Stack (ElasticSearch, Logstash and Kibana) to develop reports and risk indicators for archived SPAM & Websense
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau
- Create views in Tableau Desktop that are published to internal team for review and further data analysis and customization using filters and actions.
- Writing XSLT template to transform the content fetched from Marklogic and render them in intended format
- Writing CORB task to process the content in Marklogic in bulk
- Analyze search requirements and write search queries in Xquery language by utilizing Marklogic developer API
Environment: HDFS Sqoop, Flume, LINUX, Oozie, Hadoop, Pig, Hive, Hbase, Cassandra, Hadoop Cluster, Amazon Web services
Hadoop Developer
Confidential, New Hampshire, NH
Responsibilities:
- Good understanding and related experience with Hadoop stack-internals, Hive, Pig and Map/ Reduce.Experience in setting up of clusters utilizing cloudera manager
- Migrated 160 tables from Oracle to Cassandra using Apache Spark.
- Built out the frontend using Spray, the actor-based framework. This proved to be an excellent choice to build a Restful, lightweight and asynchronous web service
- Implemented various roots for the application using spray.
- Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively. Used Scala to write code for all Spark use cases.
- Assigned name to each of the columns using case class option in Scala.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop
- Using Spark Context, Spark-SQL, Data Frame, Pair RDD's and YARN.
- Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
- Involved in Spark-Cassandra data modeling.
- Manual and automated installation of Cloudera's Distribution including Apache Hadoop CDH3, CDH4 environment.
- Deep understanding of schedulers, workload management, availability, scalability and distributed data platforms.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Involved in loading data from UNIX file system to HDFS. Wrote MapReduce jobs to discover trends in data usage by users.
- Involved in managing and reviewing Hadoop log files. Involved in running Hadoop streaming jobs to process terabytes of text data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Wrote pig UDF's. Designed and developed various analytical reports from multiple data sources by blending data on a single worksheet in Tableau Desktop.
- Developed HIVE queries for the analysts.
- Analyze business requirements and data sources from Excel, Oracle, SQL Server for design, development, testing, and production rollover of reporting and analysis projects within Tableau.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. Exported the result set from HIVE to MySQL using Shell scripts.
- Used Zookeeper for various types of centralized configurations. Involved in maintaining various Unix Shell scripts.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map
- Reduce jobs given by the users. Automated all the jobs starting from pulling the Data from different Data Sources like MySQL to pushing the result set Data to Hadoop Distributed File System using Sqoop.
- Used SVN for version control. Helped the team to increase Cluster from 25 Nodes to 40 Nodes.
- Maintain System integrity of all sub-components (primarily HDFS, MR, HBase, and Flume)
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Knowledge of Linux Administration, UNIX Shell scripting, Python
- Have hands on experience in writing MapReduce jobs in Java, Pig and Python and have written MapReduce programs for the analysis of data and to discover trends of data usage by the users.
- Worked on creating X12, XPATH transformations and XSLT mapping using TIBCO Active Matrix Business Works.
- Import the data from different sources like HDFS/Hbase into Spark RDD.
- Working with Architecture and Testing Teams in validating the usage of TIBCO products across the Environment.
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Worked in improving performance of the Talend jobs.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.3 for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
- Developed Splunk Dashboards, searches and reporting to support various internal clients in Security, IT Operations and Application Development.
- Involved in Preparing the High Level Design (HLD) and Low Level Design (LLD) documents for ETL Informatica process.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
- Worked on Creating Kafka topics, partitions, writing custom partitioned classes.
- Worked on Creating Kafka Adaptors for decoupling the application dependency.
- Load tested to test the performance of Kafka and JMS and compare stats.
- Worked on a Proof of Concept (POC) on Cloudera Impala to compare Impala and Hive, in terms of response time with respect to large batch processing.
- Worked on Creating Kafka topics, partitions, and writing custom partitioned classes.
- Worked on Creating Kafka Adaptors for decoupling the application dependency.
- Install, configure and administer Splunk Enterprise Server and Splunk Forwarder on Windows severs.
- Managing Splunk Universal forwarder deployment and configuration. Monitoring and maintaining Splunk performance and optimization after deployment.
- Establish and manage relationships with strategic partners in Big Data to grow Qubole after series C financing
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Java 1.6, UNIX Shell Scripting
Hadoop Developer
Confidential, Fort Myers, FL
Responsibilities:
- Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems
- Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS
- Managed Hadoop clusters using Cloudera Manager.
- AWS server provisioning using Chef Recipes.
- Applied Hive quires to perform data analysis on HBase using Storage Handler in order to meet the business requirements
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Hands on experience with NoSQL databases like HBase, Cassandra for POC (proof of concept) in storing URL's and images.
- Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed scripts in Python to enable dynamic resource sharing across Hadoop workloads
- Worked with cloud services like Amazon web services (AWS)
- Involved in ETL, Data Integration and Migration
- Used different file formats like Text files, Sequence Files, Avro
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, MapR
- Cluster co-ordination services through Zookeeper
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts
- Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and Troubleshooting
- Installed and configured Hadoop, MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Created job lets in Talend for the processes which can be used in most of the jobs in a project like to Start job and Commit job.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Involved in helping the UNIX and Splunk administrators to deploy Splunk across the UNIX and windows environment.
- Used SPLUNK tool in order to analyze the logs in the applications.
- Prepare ETL specification, creating mappings, Teradata scripts for extraction, transformation and loading of data to data warehouse.
- Created views on Oracle to source the data from DART data warehouse and expose them to Axiom SL reporting tool.
- Performed data analytics using PIG, Impala, Hive, and Language R for Data Scientists within the team to improve Business and future Strategy development.
- Developed jobs in Talend Enterprise edition from stage to source, intermediate, conversion and target.
- Created Splunk app for Enterprise Security to identify and address emerging security threats through the use of continuous monitoring, alerting and analytics.
Environment: HDFS Sqoop, Flume, LINUX, Oozie, Hadoop, Pig, Hive, Hbase, Cassandra, Hadoop Cluster, Amazon Web services
Hadoop Developer
Confidential, Providence, RI
Responsibilities:
- Established custom MapReduce programs in order to analyze data and used Pig Latin to clean unwanted data
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Involved in creating Hive tables, then applied HiveQL on those tables for data validation.
- Moved the data from Hive tables into Mongo collections.
- Involved in loading and transforming Worked on Hadoop cluster which ranged from 4-8 nodes during pre- production stage and it was sometimes extended up to 24 nodes during production
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Experienced in installing, upgrading & managing Apache, Cloudera (CDH4) and Hortonworks distributions for Hadoop. large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications
- Used Zookeeper to manage coordination among the clusters
- Experienced in analyzing MongoDB database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Gave assistance in exporting the analyzed data to RDBMS using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Assisted application teams in installing Hadoop updates, operating system, patches and version upgarades when required
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files
- Experience in writing Python Scripts.
- Worked with administrators to ensure Splunk is actively and accurately running and monitoring on the current infrastructure implementation.
- Extensive experience on setting up the Splunk to monitor the customer volume and track the customer activity. Have involved as a Splunk Admin in capturing, analyzing and monitoring front end and middle ware applications.
Environment: Hadoop, Pig, Hive, Sqoop, Flume, MapReduce, HDFS, LINUX, Oozie, MongoDB
Hadoop Developer
Confidential, Fairfield, AL
Responsibilities:
- Gathered the business requirements from the Business Partners and Subject Matter Experts Involved in installing Hadoop Ecosystem components.
- Responsible to manage data coming from different sources and Involved in HDFS maintenance and loading of structured and unstructured data.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in loading data from LINUX file system to HDFS.
- Experience in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented test scripts to support test driven development and continuous integration.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked on tuning the performance Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple Map Reduce jobs.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Linux, Java, Oozie, HBase.