Hadoop Developer Resume
Long Beach, CA
PROFESSIONAL SUMMARY:
- Overall 9+ years of experience and 5 years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS,HBASE, MapReduce, HIVE, PIG, FLUME, MongoDB, OOZIE, SQOOP, and ZOOKEEPER, Java, J2EE, Web Services, XML, oracle 9i/10g and 11g, SQL, HTML, CSS, JSON, Angular JavaScript, JQuery and Ajax.
- Experience analyzing data using HIVE, Pig Latin, HBase and custom Map Reduce programs in Java.
- Expertise in working with Cloudera Hadoop distribution.
- Experience in building data pipelines and defining data flow across large systems.
- Deep understanding of data import and export from relational database into Hadoop cluster.
- Experience in handling data load from Flume to HDFS.
- Experience in handling data import from NoSQL solutions like MongoDB to HDFS.
- Experience in data extraction and transformation using MapReduce jobs.
- Experience in Big Data Analytics using Cassandra, MapReduce and relational databases.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, and Flume.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom MapReduce programs in Java. Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
- Experience in writing Map Reduce jobs using Java.
- Built real - time Big Data solutions using HBASE handling billions of records.
- Extensive knowledge and work Experience in Systems Analysis, Design, Development, Implementation and Testing of Application software for Business solutions, Database Management, Data Analytics.
- Extensive experience in GUI design using JSP, JSF, HMVC Pattern, MVC Architecture, leading to substantial reduction in time and effort.
- Experience in building enterprise Applications and Distributed Systems using technologies J2EE, EJB 2.1/3.0,OpenEJB, RMI, JPA, IBM MQ Series, Active MQ, OpenJPA, JDBC, JSP, Struts, Servlets, JMS, EMS,XML and JavaScript.
- Hands-on experience in writing Pig Latin scripts, working with grunt shells and scheduling workflows with Oozie.
- Worked on Classic and Yarn distributions of Hadoop like the Apache Hadoop 2.0.0, ClouderaCDH4 and CDH5.
- Use of IDE for developing environment like Eclipse, NetBeans, Sun ONE Studio, Web Sphere Studio 7.0 8.0, Jbuilder, Web Gain Business Designer Structure Builder, Elixir Case, and Visual Source Safe and Erwin for Data base Scheme Design.
- Sound RDBMS concepts and extensively worked with Oracle 8i 9i 10g 11g, DB2, SQL Server 8.0 9.0 10.0 10.5 11.0 , MySQL, MS-Access and Toad.
- Experienced in writing PL SQL procedures, Triggers in Oracle and Stored Procedures in DB2 and MySQL.
- Experience in working with the Columnar NoSQL Database like HBase, Cassandra to manage extremely large data sets.
- Extensively created mappings in Talend using tMap, tJoin, tReplicate, tParallelize, tJava, tJavarow, tDie, tAggregate Row, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap etc.
- Hands on experience in in-memory data processing with Apache Spark.
- Configured and developed complex dashboards and reports on Splunk
- Knowledge about Splunk architecture and various components (indexer, forwarder, search head, deployment server)
- Working on splunk Universal forwarder and Heavy Weight Forwarder
- Strong experience on Hadoop distributions Hortonworks & Cloudera
- Skilled in Tableau Desktop for data visualization through various charts such as bar charts, line charts, combination charts, pivot table, scatter plots, pie charts and packed bubbles and use multiple measures for comparison such as Individual Axis, Blended Axis, and Dual Axis.
- Published the dashboard reports to Tableau Server for navigating the developed dashboards in web.
- Co-ordinate with business and understand analytics requirements
- Automated the jobs by pulling data from different sources to load data into HDFStables using Oozie workflows.
- Interface with SME's, Analytics team Account managers and Domain Architects to review to-be developed solution
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
PROFESSIONAL EXPERIENCE:
Confidential, Long Beach, CA
Hadoop Developer
Responsibilities:
- Gathering the requirements for the Near-real Time data replication into EDL.
- Responsible for the architectural designing of the near real time data replication frame work and testing it.
- Assisting with Palantir Analytics use cases for Confidential Health care.
- Responsible for gathering the requirement for the daily and monthly incremental data lods.
- Providing Data integration, Data ingestion for the use cases required for the palantir analytics.
- Writing Mesa code in helping data transformation and data integration required for Palantir Analytic use cases.
- Successfully designed the daily and monthly incremental loads from sql databases to hdfs and No-sql data base such as Hbase.
- Successfully implemented partitioning in Hive and in creating Hbase snapshot tables.
- Responsible for setting up of the polybase in MsSql server and creating the respective tables into polybase.
- Data validation and Data Quality check frame work build were successfully done and is responsible for building the frameworks using Talend Big Data.
- Performance optimization check and error logging was in built in the framework to successfully test the ingestion framework.
- Responsible for the communication with different teams on inbound and outbound data requirements and cooking up the data as per the requirements.
- Also successfully implemented spark streaming which brings the real-time data to the HDFS.
- Successfully implemented Kafka-clustering and setting up the topics in Kafka for real time data ingestion.
- Involved in developing Scala script to implement Spark job in analyzing and validating the data ingested.
- Successfully implemented image upload and retrieval from Hbase to meet the SLA requirement in which traditional image retrievals for some of the applications.
- Involved in data ingestion and framework build for one of the predictive use case which is Cardio-Vascular disease prediction.
- Successful in implementing Apache zeppelin with hive, sql and spark for building the charts for CVD (Cardio vascular Disease).
- Build end-end solution to store unstructured data like images, pdf's into Hadoop and Hbase and render the data back to different web applications Using REST and TRIFT API's.
- Used Native Hadoop API 'webHDFS' to expose data residing in HDFS to various Web Applications.
- Integrated Apache Storm with Kafka to perform text analytics on Image metadata. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Used HIPI an image processing library designed to be used with the Apache Hadoop MapReduce parallel programming framework to efficiently process the claim related images with MapReduce style parallel programming.
- Providing L1 and L2 support activities for all Palantir use cases business users across all the state.
- Data validation and quality check for data consume by Palantir use cases.
- Worked on PowerBI Dashboard to make some changes on behalf of Business requirements
- Experience in creating PowerBI Dashboards (Power View, Power Query, Power Pivot, Power Maps).
- Worked on ELK Stack (ElasticSearch, Logstash and Kibana ) to develop reports and risk indicators for archived SPAM & Websense
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau
- Worked on AmazonAWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Create views in Tableau Desktop that are published to internal team for review and further data analysis and customization using filters and actions.
- Writing XSLT template to transform the content fetched from Marklogic and render them in intended format
- Writing CORB task to process the content in Marklogic in bulk.
- Analyze search requirements and write search queries in Xquery language by utilizing Marklogic developer API.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
- Able to create scripts for system administration using languages such as Power Shell, BASH and Python.
- Handling team / Supporting Onsite Offshore team
- Writing Perl scripts for generating all sorts of reports (.txt, .csv, .xlsx) to report back on the dispositions, counts, summary to be sent to the business clients.
- Worked on different file formats (PARQUET, TEXTFILE) and different compression codecs (GZIP, SNAPPY, LZO).
- Designed and developed External and Managed Hive Tables with data formats such as Text, Avro, Sequence File, RC, ORC, and parquet.
Environment: HDFS Sqoop, Flume, LINUX, Oozie, Hadoop, Pig, Hive, Hbase, Cassandra, Hadoop Cluster, Amazon Web services
Confidential, Rochester, MN
Hadoop Developer
Responsibilities:
- Good understanding and related experience with Hadoop stack-internals, Hive, Pig and Map/Reduce.Experience in setting up of clusters utilizing cloudera manager
- Migrated 160 tables from Oracle to Cassandra using Apache Spark.
- Built out the frontend using Spray, the actor-based framework. This proved to be an excellent choice to build a Restful, lightweight and asynchronous web service
- Implemented various roots for the application using spray.
- Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively.
- Used Scala to write code for all Spark use cases.
- Assigned name to each of the columns using case class option in Scala.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop
- Using Spark Context, Spark-SQL, Data Frame, Pair RDD's and YARN.
- Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of
- Parallelism, selection of correct Serialization & memory tuning.
- Involved in Spark-Cassandra data modeling.
- Manual and automated installation of Cloudera’s Distribution including Apache Hadoop CDH3, CDH4 environment.
- Deep understanding of schedulers, workload management, availability, scalability and distributed data platforms.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Involved in loading data from UNIX file system to HDFS.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Wrote pig UDF’s.
- Designed and developed various analytical reports from multiple data sources by blending data on a single worksheet in Tableau Desktop.
- Developed HIVE queries for the analysts.
- Analyze business requirements and data sources from Excel, Oracle, SQL Server for design, development, testing, and production rollover of reporting and analysis projects within Tableau.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Used Zookeeper for various types of centralized configurations.
- Involved in maintaining various Unix Shell scripts.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map
- Reduce jobs given by the users.
- Automated all the jobs starting from pulling the Data from different Data Sources like MySQL to pushing the result set Data to Hadoop Distributed File System using Sqoop.
- Used SVN for version control.
- Helped the team to increase Cluster from 25 Nodes to 40 Nodes.
- Maintain System integrity of all sub-components (primarily HDFS, MR, HBase, and Flume).
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Knowledge of Linux Administration, UNIX Shell scripting, Python
- Have hands on experience in writing MapReduce jobs in Java, Pig and Python and have written MapReduce programs for the analysis of data and to discover trends of data usage by the users.
- Worked on creating X12, XPATH transformations and XSLT mapping using TIBCO Active Matrix Business Works.
- Import the data from different sources like HDFS/Hbase into Spark RDD.
- Working with Architecture and Testing Teams in validating the usage of TIBCO products across the Environment.
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Worked in improving performance of the Talend jobs.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.3 for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
- Developed Splunk Dashboards, searches and reporting to support various internal clients in Security, IT Operations and Application Development.
- Involved in Preparing the High Level Design (HLD) and Low Level Design (LLD) documents for ETL Informatica process.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
- Worked on Creating Kafka topics, partitions, writing custom partitioned classes.
- Worked on Creating Kafka Adaptors for decoupling the application dependency.
- Load tested to test the performance of Kafka and JMS and compare stats.
- Worked on a Proof of Concept (POC) on Cloudera Impala to compare Impala and Hive, in terms of response time with respect to large batch processing.
- Worked on Creating Kafka topics, partitions, and writing custom partitioned classes.
- Worked on Creating Kafka Adaptors for decoupling the application dependency.
- Install, configure and administer Splunk Enterprise Server and Splunk Forwarder on Windows severs.
- Managing Splunk Universal forwarder deployment and configuration. Monitoring and maintaining Splunk performance and optimization after deployment.
- Establish and manage relationships with strategic partners in Big Data to grow Qubole after series C financing
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Java 1.6, UNIX Shell Scripting
Confidential, O’Fallon, MO
Hadoop Developer
Responsibilities:
- Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems
- Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS
- Managed Hadoop clusters using Cloudera Manager.
- AWS server provisioning using Chef Recipes.
- Applied Hive quires to perform data analysis on HBase using Storage Handler in order to meet the business requirements
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Hands on experience with NoSQL databases like HBase, Cassandra for POC (proof of concept) in storing URL’s and images.
- Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed scripts in Python to enable dynamic resource sharing across Hadoop workloads
- Worked with cloud services like Amazon web services (AWS)
- Involved in ETL, Data Integration and Migration
- Used different file formats like Text files, Sequence Files, Avro
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, MapR
- Cluster co-ordination services through Zookeeper
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts
- Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and Troubleshooting
- Installed and configured Hadoop,MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Created job lets in Talend for the processes which can be used in most of the jobs in a project like to Start job and Commit job.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Involved in helping the UNIX and Splunk administrators to deploy Splunk across the UNIX and windows environment.
- Used SPLUNK tool in order to analyze the logs in the applications.
- Prepare ETL specification, creating mappings, Teradata scripts for extraction, transformation and loading of data to data warehouse.
- Created views on Oracle to source the data from DART data warehouse and expose them to Axiom SL reporting tool.
- Performed data analytics using PIG, Impala, Hive, and Language R for Data Scientists within the team to improve Business and future Strategy development.
- Developed jobs in Talend Enterprise edition from stage to source, intermediate, conversion and target.
- Created Splunk app for Enterprise Security to identify and address emerging security threats through the use of continuous monitoring, alerting and analytics.
- Interface with SME's, Analytics team Account managers and Domain Architects to review to-be developed solution
- Coordinated multiple projects with offshore team.
- Managed direct implementation team and Offshore implementation teams.
Environment: HDFS Sqoop, Flume, LINUX, Oozie, Hadoop, Pig, Hive, Hbase, Cassandra, Hadoop Cluster, Amazon Web services
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Established custom MapReduce programs in order to analyze data and used Pig Latin to clean unwanted data
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Involved in creating Hive tables, then applied HiveQL on those tables for data validation.
- Moved the data from Hive tables into Mongo collections.
- Involved in loading and transforming Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Experienced in installing, upgrading & managing Apache, Cloudera (CDH4) and Hortonworks distributions for Hadoop. large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications
- Used Zookeeper to manage coordination among the clusters
- Experienced in analyzing MongoDB database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Gave assistance in exporting the analyzed data to RDBMS using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Assisted application teams in installing Hadoop updates, operating system, patches and version upgarades when required
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files
- Experience in writing Python Scripts.
- Worked with administrators to ensure Splunk is actively and accurately running and monitoring on the current infrastructure implementation.
- Extensive experience on setting up the Splunk to monitor the customer volume and track the customer activity. Have involved as a Splunk Admin in capturing, analyzing and monitoring front end and middle ware applications.
Confidential, Bloomington, IL
Hadoop Developer
Responsibilities:
- Gathered the business requirements from the Business Partners and Subject Matter Experts
- Involved in installing Hadoop Ecosystem components.
- Responsible to manage data coming from different sources and Involved in HDFS maintenance and loading of structured and unstructured data.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in loading data from LINUX file system to HDFS.
- Experience in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented test scripts to support test driven development and continuous integration.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked on tuning the performance Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple Map Reduce jobs.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Linux, Java, Oozie, HBase.
Confidential
Jr. Java Developer
Responsibilities:
- Inventory & Purchase order Application:
- Maintained the information about the materials used by the production department.
- Issuing of materials from inventory, checking stock and raising of quotations.
- Raising purchase order after doing comparative study of quotations received from different quotations.
- Sales & Purchase Information system:
- Automation of sales & purchase system with their customers and vendors.
- Maintaining products related details and recording the transactions for invoices, customer payments and updating stock.
- User Interface application for loco remote monitoring system:
- Show the present and past information about any locomotives supplied by Medha.
- Display all the important characteristics and live status of trains running.
- Show the faults lists of any locomotive in a specific order selected by user.
