We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Phoniex, AZ

SUMMARY

  • Overall 10 years ' of professional IT experience with 5 years of experience in analysis, architectural design, prototyping, development, Integration and testing of applications using Java/J2EE Technologies and 2 years of experience in Big Data Analytic as Hadoop Developer.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Hands on experience in creating A pache Spark RDD transformations on Data sets in the Hadoop data lake.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytic.
  • Hands on experience working on NoSQL databases including Hbase, Cassandra and its integration with Hadoop cluster.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
  • Experienced in Data Ingestion projects to inject data into Data lake using multiple sources systems using Talend Bigdata.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloud-era and AWS.
  • Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store.
  • Hands on experience in solving software design issues by applying design patterns including Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern and Template Pattern
  • Experienced in creative and effective front-end development using JSP, JavaScript, HTML 5, DHTML, XHTML Ajax and CSS.
  • Experience in analysis, design, development and integration using Bigdata - Hadoop Technology like MapReduce, Hive, Pig,Sqoop, Ozzie,Kafka Streaming, HBase, Azure, AWS, Cloudera, Horton works, Impala, Avro, Data Processing, Java/J2EE, SQL.
  • Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Hive, Spark, Scala, Spark-SQL, MapReduce, Pig, Sqoop, Flume, HBase, Zookeeper, and Oozie.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics , Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Experience in Cloud in Confidential AZURE using Azure Cloud, Azure Data Factory, Azure Data Lake Analytics, Azure Data Bricks, GIT, Azure DevOps, Azure SQL Data Warehouse.
  • Proven experience with varied Architecture experience across IT including Client Server Technologies (J2EE, .NET), Cloud Technologies (Azure, AWS), Security Architecture (SAML 2.0, OKTA), Security User provisioning (CA solutions), Mulesoft, and WebSphere Message broker - WMB/WBIMB interfaces, MQ Series
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
  • Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure
  • Datacenter Migartion, Azure Data Services have strong virtualization experience
  • Experience in troubleshooting and resolving architecture problems including database and storage, network, security and applications
  • Extensive experience in developing strategies for Extraction, Transformation and Loading data from various sources into Data Warehouse and Data Marts using DataStage.
  • Having extensive experience in Data Integration and Migration using IBM Infosphere DataStage(9.1), Quality stage, SSIS, Oracle, Teradata, DB2, SQL and Shell script along with technical certifications in ETL development from IBM and Cloudera .
  • Well exposure with functional point analysis while Estimation, Planning and Design in DataStage platform with Implementation.
  • Extensive ETL tool experience using IBM Talend Enterprise edition, InfoSphere/WebSphere DataStage, Ascential DataStage, Bigdata Hadoop and SSIS. Worked on DataStage client tools like DataStage Designer, DataStage Director and DataStage Administrator.
  • Experienced in scheduling sequence, parallel and server jobs using DataStage Director, UNIX scripts and scheduling tools. Designed and developed parallel jobs, server and sequence jobs using DataStage Designer.
  • Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools) worked on Azure suite: Azure SQL Database, Azure Data Lake(ADLS), Azure Data Factory(ADF) V2, Azure SQL Data Warehouse, Azure Service Bus, Azure key Vault, Azure Analysis Service(AAS), Azure Blob Storage, Azure Search, Azure App Service,Azure data Platform Services.
  • Design and implement end-to-end data solutions (storage, integration, processing, visualization) in Azure
  • Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL and how it can be used for data transformation as part of a cloud data integration strategy
  • Experience in Configured Hive meta store with MySQL, which stores the metadata for Hive tables. Extensive experience in creating data pipeline for Real Time Streaming applications using Kafka Streaming, Flume, Storm and Spark Streaming and analyze sentiment analysis for twitter source
  • Architect the data lake by cataloging the source data, analyzing entity relationships, and aligning the design as per performance, schedule & reporting requirements
  • Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
  • Experience in submitting Talend jobs for scheduling using Talend scheduler which is available in the Admin Console.
  • IBM ETL Talend Data Stage Developer with 8+ years in Information technology having worked in Design, Development, Administrator and Implementation of various database and data warehouse technologies (IBM Talend Enterprise edition and Data Stage v9.X/8.X/7.X) using components like Administrator, Manager, Designer and Director
  • Extensive ETL tool experience using IBM Talend Enterprise edition, InfoSphere/WebSphere DataStage, Ascential DataStage, Bigdata Hadoop and SSIS. Worked on DataStage client tools like DataStage Designer, DataStage Director and DataStage Administrator.
  • Around 3 years working experience in Talend(ETL Tool) to developing & leading the end to end implementation of Big Data projects, comprehensive experience as a Hadoop Developer in Hadoop Ecosystem like Hadoop,Map Reduce, Hadoop Distributed File System (HDFS), HIVE,IMPALA,Yarn,Ozie, Hue,Spark.
  • Expertise on working with various databases in writing SQl queries, Stored Procedures, functions and Triggers by using PL\SQL and SQl.
  • Experience in NoSQL Column-Oriented Databases like Cassandra, HBase, MongoDB and Filo DB and its Integration with Hadoop cluster.
  • Design and development experience in ETL of data from various source systems usingSqoop, Hive, Pig and Data Lake for analytics.
  • Expertise in building data lake solutions for enterprises using open source Hadoop and Cloudera distribution technologies.
  • Big Data Technologies in highly scalable end-to-end Hadoop Infrastructure to solve business problems which involves building large scale data pipelines, data lakes, data warehouse, real-time analytics and reporting solutions.
  • Implement AWS Data Lake leveraging S3, terraform, vagrant/vault, EC2, Lambda, VPC, and IAM in performing data processing and storage while writing complex SQL queries, analytical and aggregate functions on views in Snowflake data warehouse to develop near real time visualization using Tableau Desktop/Server 10.4 and Alteryx.
  • Good exposure of Web Services using CFX/ XFIRE and Apache Axis, for the exposure and consumption of SOAP Messages
  • Working knowledge of database such as Oracle 8i/9i/10g, Microsoft SQL Server, DB2. Experience in writing numerous test cases using JUnit framework with Selenium.
  • Leverage AWS, Informatica Cloud, Snowflake Data Warehouse, Hashi corp Platform, AutoSys, and Rally Agile/SRUM to implement Data Lake, Enterprise Data Warehouse, and advanced data analytics solutions based on data collection and integration from multiple sources ( Salesforce, Salesconnect, S3, SQL Server, Oracle, NoSQL and Mainframe systems).
  • Strong work ethic with desire to succeed and make significant contributions to the organization.Strong problem solving skills , good communication, interpersonal skills and a good team player.
  • Expertise in writing Hadoop Jobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka Streaming, Scala, Oozie and Talend ETL .
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce concepts.
  • Strong understanding of data warehouse and data lake technology.
  • Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java . Experience in importing/exporting data using Sqoop into HDFS from Relational Database Systems and vice - versa.
  • Good at developing Big data based Solutions using Hadoop and Spark, Information Retrieval and Machine Learning areas.
  • Experienced in implementing Real-Time streaming and analytics using various technologies i.e. Spark Streaming and Kafka.

TECHNICAL SKILLS

Big Data Technologies: Hive, Hadoop, Map Reduce, Hdfs, Sqoop, R, Flume, Spark, Apache Kafka, Hbase, Pig, Elastic search, AWS, Oozie, Zookeeper, Apache hue, Apache Tez, YARN, Talend, Storm, Impala, Tableau and Qlikview.

Programming Languages: Java JDK1.4/1.5/1.6 JDK 5/JDK 6, C/C, Matlab, R, HTML, SQL, PL/SQL SQL, C, C++, Java, J2EE, Pig Latin, Hive, Scala, Java, Python, TSQL, Latin, HiveQ

Framework: Hibernate 2.x/3.x, Spring 2.x/3.x,Struts 1.x/2.x and JPA

Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey

Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML

Operating Systems: UNIX, Windows, LINUX

Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere

Web technologies: JSP, Servlets, Socket Programming, JNDI, JDBC, Java Beans, JavaScript, Web Services JAX-WS

Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 MySQL 4.x/5.x

Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0

Tools: TOAD, SQL Developer, SOAP UI, ANT, Maven, Visio, Rational Rose, Datastage

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential, Phoniex, AZ

Responsibilities:

  • Hands-on experience with Apache Spark and its components (Spark core and Spark SQL). Hands on experience in in-memory data processing with Apache Spark and Apache Nifi/Minifi.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple maps reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Extensively involved in the Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing. Importing and exporting data into HDFS and Hive using SQOOP. Developed a working prototype for real time data ingestion and processing using Kafka, Spark Streaming, and HBase.
  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Migrating Exadata/Informatica projects to Talend.
  • Creating Hive and vertica queries to help market analysts spot emerging trends by comparing fresh data with reference tables use different Components of Talend (tOracleInput,tOracleOutput,tHiveInput,tHiveOutput,tHiveInputRow,tVerticaInput,tVerticaOutput,tVerticaRow,tUniqeRow,tAggregateRow,tRunJob,tPreJob,tPostJob,tMap,tJavaRow,tJavaFlex,tFilterRow etc) to develop standard jobs.
  • Load data from different source (database and files) into Hive using Talend tool (standard, Map Reduce and Spark job), monitor System health and logs and respond to any warning or failure conditions.
  • Load and transform data into HDFS from large set of structured data /Oracle/Sql server using Talend Big data studio.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Data analytics and engineering experience in multiple Azure platforms such as Azure SQL, Azure SQL Data warehouse, Azure Data Factory, Azure Storage Account etc. for source stream extraction, cleansing, consumption and publishing across multiple user bases.
  • Involved in the data ingestion process through datastage to load data into HDFS from Mainframes, Greenplum,teradata, DB2.
  • Created ETL guidelines document which involves coding standards, naming conventions for development and production support log and root cause analysis documents for troubleshooting DataStage jobs.
  • Created Datastage jobs using different stages like Transformer, Aggregator, Sort, Join, Merge, Lookup, Data Set, Funnel, Remove Duplicates, Copy, Modify, Filter, Change Data Capture, Change Apply, Sample, Surrogate Key, Column Generator, Row Generator, Etc.
  • Coded generic reusable DataStage components for loading and unloading data to and from Teradata.
  • Have experience in migrating DataStage jobs from 8.1 and 8.5 versions to 9.1.
  • Created Azure Data Factory pipeline to insert the flat file, Orc file data into Azure SQL.
  • Cloud based report generation, development and implementation using SCOPE constructs and power BI. Expert in U-SQL constructs for interacting multiple source streams with in Azure Data Lake.
  • Design and implement end-to-end data solutions (storage, integration, processing, visualization) in Azure
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala .
  • Experienced in developing scripts for doing transformations using Scala .
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
  • Responsible for building scalable distributed data solutions using Hadoop. Involved in loading data from edge node to HDFS using shell scripting.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, and Bzip etc.
  • Analyze large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, HiveUDF, Pig, Sqoop, Zookeeper, &Spark.
  • Developed custom aggregate functions using SparkSQL and performed interactive querying. Used Scoop to store the data into HBase and Hive.
  • Knowledgeable in SPARK and Scala Framework exploration for transition from Hadoop/Map Reduce to SPARK.
  • Worked on installing cluster, commissioning & decommissioning of DataNode, NameNode high availability, capacity planning, and slots configuration.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL. Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive Serdes. Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to HBase using Sqoop. Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Developing UDFs in java for hive and pig, Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Analysed the SQL scripts and designed the solution to implement using Scala.
  • Developed analytical component using Scala, Spark and Spark Stream.
  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig. Developed Simple to complex Map and Reduce Jobs using Hive and Pig. Worked on Apache Spark for in memory data processing into Hadoop.
  • Responsible for building scalable distributed data solutions using MongoDB and Cassandra.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Provided support to data analysts in running Pig and Hive queries. Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL). Applied transformations and filtered traffic using Pig.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster. Responsible for building scalable distributed data solutions on a cluster using Cloudera Distribution.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs. Setup and benchmarked Hadoop and HBase clusters for internal use.
  • Responsible for building scalable distributed data solutions using Hadoop. Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
  • Involved in loading data from Oracle database into HDFS using Sqoop queries. Implemented Map reduces programs to get Top K Results using Map Reduce programs by fallowing Map Reduce Design Patterns .
  • Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
  • Implemented working with different sources using Multi Input formats using Generic and Object Writable. Implemented best income logic using Pig scripts and Joins to transform data to AutoZone custom formats.
  • Implemented custom comparators and partioners to implement Secondary Sorting. Worked on tuning the performance of Hive queries. Implemented Hive Generic UDF's to implement business logic.
  • Responsible to manage data coming from different sources. Configured Time Based Schedulers that get data from multiple sources parallel using Oozie work flows . Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Used Zookeeper for providing coordinating services to the cluster. Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations using R as per project proposals.
  • Assisted monitoring Hadoop cluster using Gangila. Implemented test scripts to support test driven development and continuous integration.
  • Configured build scripts for multi module projects with Maven and Jenkins CI. Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.

Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, JSP, Structs2.0, NoSQL, HDFS, Teradata, Sqoop, LINUX, Oozie, Hue, HCatalog, Java. IBM Cognos, Oracle 11g/10g, Microsoft SQL Server, Microsoft SSIS, DB2 LUW, TOAD for DB2, IBM Data Studio, AIX 6.1, UNIX Scripting

Hadoop Developer

Confidential

Responsibilities:

  • As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
  • Designed and Implemented real-time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion .
  • Hands-on experience with IBM Big Data product offerings such as IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL .
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Experienced in creating data pipeline integrating kafka streaming with spark streaming application used scala for writing applications.
  • Used sparkSQL for reading data from external sources and processes the the data using Scala computation framework.
  • Created many complex ETL jobs for data exchange from and to Database Server and various other systems including RDBMS, XML, CSV, and Flat file structures. Integrated java code inside Talend studio by using components like tJavaRow, tJava, tJavaFlex and Routines.
  • Experienced in using debug mode of talend to debug a job to fix errors.
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
  • Conducted JAD sessions with business users and SME's for better understanding of the reporting requirements.
  • Developed Talend jobs to populate the claims data to data warehouse - star schema.
  • Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis.
  • Worked on various Talend components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.
  • Worked Extensively on Talend Admin Console and Schedule Jobs in Job Conductor.
  • Extensive ETL tool experience using IBM Talend Enterprise edition, InfoSphere/WebSphere DataStage, Ascential DataStage, Bigdata Hadoop and SSIS. Worked on DataStage client tools like DataStage Designer, DataStage Director and DataStage Administrator.
  • Experienced in scheduling sequence, parallel and server jobs using DataStage Director, UNIX scripts and scheduling tools. Designed and developed parallel jobs, server and sequence jobs using DataStage Designer.
  • Worked on the Architecture of ETL process. Created DataStage jobs (ETL Process) for populating the data into the Data warehouse constantly from different source systems like ODS, flat files, scheduled the same using DataStage Sequencer for SI testing.
  • Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API languages like Java, Scala and Akka Asynchronous programming Framework.
  • Involved in Developing Assert Tracking project where we use to collect real-time vehicle location data using IBM streams from JMS queue and processed that data in Vehicle Tracking using ESRI GIS Mapping Software, Scala and Akka Actor Model .
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data from HBase through Sqoop and place in HDFS for further processing.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster. Involved in creating Hive tables, loading data and running hive queries on the data.
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Worked with NoSQL database, HBase to create tables and store data. Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Developed Pig scripts for the analysis of semi structured data. Developed evaluations in writing Pig's Load and Store functions.
  • Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased products on the website.
  • Developed and involved in the industry specific UDF (User Defined Functions). Involved in developing optimized Pig Script and testing Pig Latin Scripts.
  • Experience working with Apache SOLR for indexing and querying. Written JUnit test cases for Storm Topology. Configured the Kafka Streaming Mirror Maker cross-cluster replication service.
  • Monitored multiple Hadoop clusters environments using Ganglia. Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Monitored workload, job performance and capacity planning using Cloudera Manager. Involved in developing web-services using REST, HBase Native API and BigSQL Client to query data from HBase. Experienced in Developing Hive queries in BigSQL Client for various use cases.
  • Involved in developing few Shell Scripts and automated them using CRON job scheduler. Implemented test scripts to support test driven development and continuous integration. Responsible to manage data coming from different sources.
  • Experience of creating architecture on AWS, Deploying Hadoop clusters using AWS EC2. Created and accessed AWS S3 buckets. Connected to AWS EC2 using SSH and ran spark-submit jobs.
  • Configured big data workflows to run on the top of Hadoop which comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce.
  • Developed a working prototype for real time data ingestion and processing using Kafka Streaming, Spark Streaming, and HBase.
  • Developed Kafka Streaming producer and Spark Streaming consumer to read the stream of events as per business rules. Loaded various formats of structured and unstructured data from Linux file system to HDFS.
  • Used Combiners and Partitioners in MapReduce programming and worked on high volume heterogeneous data. Written Pig Scripts to ETL the data into NOSQL database for faster analysis.
  • Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files. Parsing XML data into structured format and loading into HDFS. Scheduled various ETL process and Hive scripts by developing Oozie workflow.
  • Utilized Tableau to visualize the analyzed data and performed report design and delivery. Created POC for Flume implementation. Worked on Linux/Unix

Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka, IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL, Java

Hadoop Developer

Confidential, Chicago, IL

Responsibilities:

  • Used Rest ApI to Access HBase data to perform analytics. Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume. Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way. Experienced in managing and reviewing the Hadoop log files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS .
  • Worked with Avro Data Serialization system to work with JSON data formats . Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit . Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling. Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.
  • Developed multiple MapReduce jobs in Java for data cleaning and pre-processing. Designed and developed Oozie workflows for automating jobs. Created HBase tables to store variable data formats of data coming from different portfolios.
  • Writing Hadoop MR Programs to get the logs and feed into Cassandra for Analytics purpose. Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Implemented best income logic using Pig scripts. Moving data from Oracle to HDFS and vice-versa using SQOOP. Developed Pig scripts to convert the data from Avro to Text file format.
  • Developed Hive scripts for implementing control tables logic in HDFS. Developed Hive queries and UDF's to analyze/transform the data in HDFS. Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Worked with different file formats and compression techniques to determine standards. Installed and configured Hive and also written Hive UDFs.
  • Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis. Designed and developed read lock capability in HDFS.
  • Involved in End-to-End implementation of ETL logic. Involved in designing use-case diagrams, class diagram, interaction using UML model. Designed and developed the application using various design patterns, such as session facade, business delegate and service locator.
  • Worked on Maven build tool. Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework. Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation.
  • Good experience in Mule development. Developed Web applications with Rich Internet applications using Java applets, Silverlight, JavaFX. Involved in creating Database SQL and PL/SQL queries and stored Procedures.
  • Implemented Singleton classes for property loading and static data from DB. Debugged and developed applications using Rational Application Developer (RAD). Developed a Web service to communicate with the database using SOAP.
  • Developed DAO (Data Access Objects) using Spring Framework 3. Deployed the components in to WebSphere Application server 7. Actively involved in backend tuning SQL queries/DB script.
  • Worked in writing commands using UNIX Shell scripting. Used java in removing an attribute in JSON file where Scala was not supporting to create objects and again converted to Scala. Worked on Java & Impala and master clean-up of data.
  • Worked on accumulators to count the result after executing the job on multiple executors. Worked in intellij IDE for the development and debugging. Worked on Linux/Unix.
  • Wrote a whole set of programs for one of the LOB's in Scala and made unit testing. Created many SQL schemas and utilized them throughout the program wherever required. Made enhancements to one of the LOBs using Scala programming.
  • Ran spark-submit job and analyzed the log files. Used Maven to build .jar files, Used Sqoop to transfer data between relational databases and Hadoop.
  • Worked on HDFS to store and access huge datasets within Hadoop, Good hands on experience with git and GitHub, Created a feature node on GitHub.
  • Pushed the data GitHub and made a pull request, Experience in JSON and CFF.

Environment: Java EE 6, IBM WebSphere Application Server 7, Apache-Struts 2.0, EJB 3, Spring 3.2, JSP 2.0, WebServices, JQuery 1.7, Servlet 3.0, Struts-Validator, Struts-Tiles, Tag Libraries, ANT 1.5, JDBC, Oracle 11g/SQL, JUNIT 3.8, CVS 1.2, Rational Clear Case, Eclipse 4.2, JSTL, DHTML.

Hadoop Developer

Confidential

Responsibilities:

  • Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems. Worked on the proof-of-concept POC for Apache Hadoop framework initiation.
  • Worked on numerous POCs to prove if Big Data is the right fit for a business case. Developed Map-reduce jobs for Log Analysis, Recommendation and Analytic.
  • Wrote Map-reduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS.
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance. Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop. Involved in loading data from edge node to HDFS using shell scripting.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, and Bzip etc.
  • Analyze large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, HiveUDF, Pig, Sqoop, Zookeeper, &Spark.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying. Used Scoop to store the data into HBase and Hive.
  • Worked on installing cluster, commissioning & decommissioning of Data Node, Name Node high availability, capacity planning, and slots configuration.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL. Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive Serdes. Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to HBase using Sqoop. Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Fine-tuned Pig queries for better performance. Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team. Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS. Processed HDFS data and created external tables using Hive, in order to analyze visitors per day, page views and most purchased products.
  • Exported analyzed data to HDFS using Sqoop for generating reports. Used Map-reduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
  • Developed Hive queries for the analysts. Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS/Cassandra cluster.
  • Implemented working with different sources using Multi Input formats using Generic and Object Writable. Implemented best income logic using Pig scripts and Joins to transform data to AutoZone custom formats.
  • Implemented custom comparators and partioners to implement Secondary Sorting. Worked on tuning the performance of Hive queries. Implemented Hive Generic UDF's to implement business logic.
  • Responsible to manage data coming from different sources. Configured Time Based Schedulers that get data from multiple sources parallel using Oozie work flows . Installed Oozie workflow engine to run multiple Hive and pig jobs.

Environment: Map-reduce, Hive, Pig, Sqoop, Oracle, MapR, Informatica, Micro-strategy, Cloud-era, Manager, Oozie, Zoo-keeper. Hadoop, HDFS, Yarn, Hive, Pig, HBase,, Sqoop, Kafka Streaming, Flume, Oracle 11g, Core Java, FiloDB, Spark, Akka, Scala, Hortonworks, Ambari, Azure data, Talend, Eclipse, Web Services (SOAP, WSDL), Oozie, Node.js, Unix/Linux, Aws, JQuery, Ajax, Python, Perl, Zookeeper.

We'd love your feedback!