We provide IT Staff Augmentation Services!

Sr Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Bellevue, WA

SUMMARY:

  • 6 years of professional experience in designing, developing, debugging Web - based as well as Enterprise applications using OOA, OOD, OOPS and JAVA/J2EE technologies.
  • Over 4 years of experience in the Hadoop ecosystem components like Hadoop MapReduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, Yarn, Impala, and Cassandra.
  • Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Experience in converting MapReduce applications to Spark.
  • Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Good working experience using Sqoop to import data into HDFS from various RDBMS and vice versa.
  • Experience in analyzing data using HIVEQL and Pig Latin and custom MapReduce programs in Java
  • Working knowledge of Software Design Patterns, Big Data Technologies (Hadoop, Hortonworks Sandbox) and Cloud Technologies & design
  • Familiarity with Hadoop architecture and its components like HDFS, MapReduce, Job Tracker, Task Tracker, Name Node and Data Node
  • Knowledge on handling Hive queries using spark SQL that integrate with spark environment implemented in Scala.
  • Experience in setting up Hadoop clusters on cloud platforms like AWS.
  • Knowledge in Hadoop And Sub-Modules; HDFS, MapReduce, Apache Pig, Hive, HBase And Sqoop
  • Experience with NoSQL databases like CouchDB, MongoDB.
  • Good experience in job scheduling and monitoring tools like Oozie and ZooKeeper.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
  • Experienced in using Agile software methodology (Scrum).
  • Experience in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies that include EJB, JSP, Servlets, Struts, JMS, JDBC, JPA HTML, XML, XSL, XSLT, Java Script, Spring and Hibernate.
  • Expertise in using J2EE Application Servers like Web Logic 8.1/9.2, IBM Web Sphere 7.x/6.x and Web Servers like Tomcat 5.x/6.x
  • Designed Use Case diagrams, Class diagrams, Activity diagram, Sequence diagrams, Flow Charts, and deployment diagrams using Rational Rose Tool.
  • Proficient in writing and handling SQL Queries, Stored Procedures, and triggers.
  • Knowledge of different operating systems including Linux, Windows and UNIX Shell Script.

TECHNICAL SKILLS

Big Data: Eco SystemHDFS, MapReduce, Hadoop Streaming, ZooKeeper, kafka, Oozie, Sqoop,Hive, Pig, HBase, Spark, Cloudera CDH 3/4/5, Airflow, Flume.

NoSQL: HBase, Cassandra, MongoDB

Languages: Java/ J2EE, SQL, Shell Scripting, Python

Web Technologies: HTML, JavaScript, CSS, XML, Servlets, SOAP.

Web/ Application: Server Apache Tomcat Server, LDAP, JBOSS, IIS

Operating system: Windows, Linux and Unix

Frameworks: Springs, MVC, Hibernate, Swings

DBMS / RDBMS/ETL: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL, DataStage 7.5.2, SSIS

Reporting Tools: TableauServer 10.1

Version: ControlSVN, CVS and Rational Clear Case Remote Client

PROFESSIONAL EXPERIENCE:

Confidential, Bellevue, WA

Sr Hadoop Developer

Responsibilities:

  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Created complex SQL stored procedures and developed reports using Microsoft SQL Server 2012.
  • Designed SSIS (ETL) Packages to extract data from various heterogeneous data sources such as Access database, Excel spreadsheet and flat files into SQL Server.
  • Created the packages in SSIS (ETL) with the help of Control Flow Containers, Tasks and Data Flow Transformations.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into Hive tables, which are partitioned.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle &MySQL
  • Imported and Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop.
  • Used Flume to handle streaming data and loaded the data into Hadoop cluster
  • Developed and executed hive queries for de-normalizing the data.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analyses.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
  • Moved data from HDFS to Cassandra using MapReduce and Bulk Output Format class
  • Developed bash scripts to bring the T-log files from ftp server and then processing it to load into hive tables.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Worked on analyzing data with Hive and Pig.
  • DesignedApacheAirflowentity resolution module for data ingestion into Microsoft SQL Server.
  • Developed batch processing pipeline to process data using python andairflow. Scheduled spark jobs using airflow.
  • Created a newairflowDAGto find popular items in redshift and ingest in the main Postgres DB via a web service call.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
  • Managed, reviewed Hadoop log file and also worked in analyzing SQL scripts and designed the solution for the process using Spark.
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • All the bash scripts are scheduled using Resource Manager Scheduler.
  • Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Developed MapReduce programs for applying business rules on the data.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Involved in joining and data aggregation using Apache Crunch.
  • DevelopedSparkApplications by using Scala, Java and Implemented ApacheSparkdata processing project to handle data from various RDBMS and Streaming sources.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Manage the day-to-day operations of the cluster for backup and support
  • Defined best practices for creatingTableaudashboards by matching requirements to the charts to be chosen, color patterns as per user's needs, standardizing dashboard's size, look and feel etc.
  • Developed visualizations using sets, Parameters, Calculated Fields, Actions, sorting, Filtering, Parameter driven analysis.
  • Experience in setting up Hadoop clusters on cloud platforms like AWS.
  • Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution

Environment: Apache Hadoop, HBase, Hive, Pig, Sqoop, ZooKeeper, Hortonworks, NoSQL, HBase, Storm, Microsoft SQL Server 2012, ETL, YARN, Apache Airflow Dag, MapReduce, TableauServer 10.1 HDFS, Scala, Impala, Flume, MySQL, JDK1.6, J2ee, JDBC, Servlets, JSP, Struts 2.0, Spring 2.0, Hibernate, Python, WebLogic, SOAP, MongoDB, Spark.

Confidential, San Jose CA

Sr Hadoop Developer

Responsibilities:

  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources. Importing and exporting data into HDFS and Hive using SQOOP.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
  • Developed the SQL Server Integration Services (SSIS) packages as well as created stored procedures used in SSIS to load/transform data to the database.
  • Personalized the dashboards to reflect user-specific alerts and metrics to track Key Performance Indicators (KPI).
  • Created Stored Procedures to transform the Data and worked extensively in T-SQL for various needs of the transformations while loading the data.
  • Worked as adeveloperin creating complex T-SQL, stored procedures, cursors, tables, and views and other SQL joins and statements for applications.
  • Involved in creating Hive Tables, loading data and writing Hive queries.
  • Utilized Apache Hadoop environment by Cloudera.
  • Created Data model for Hive tables.
  • Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP.
  • Written Spark applications using Scala to interact with theMySQLdatabase using Spark SQL Context and accessed Hive tables using Hive Context.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way. Implemented six nodes CDH4 Hadoop Cluster on CentOS
  • Installed Hadoop, MapReduce, HDFS, and Developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing. Extensively involved in Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.
  • Responsible to manage data coming from different sources.
  • Involved in creating Hive Tables, loading data and writing Hive queries.
  • Utilized Apache Hadoop environment by Cloudera.
  • Created Data model for Hive tables.
  • Consumed XML messages using Kafka and processed the xml file usingSparkStreaming to capture UI updates.
  • Developed real time ingestion of System and free form remarks/messages using Kafka andSpark Streaming to make sure the events are available in customer's activity timeline view in real-time
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Proficient in designing Row keys and Schema Design for NoSQL Database, Hbase and knowledge of other NOSQL database Cassandra.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Monitored and managed Hadoop cluster using the Cloudera Manager web- interface.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, SQOOP, package and MYSQL.
  • Worked with NoSQL database HBase in getting real time data analytics using Apache Spark with Scala.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs.
  • Hands on experience in performing real time analytics on big data using HBase and Cassandra in Kubernetes & Hadoop clusters.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
  • Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows. Involved in Unit testing and delivered Unit test plans and results documents.
  • Developed Hive queries for Analysis across different banners.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed Hive UDF’s to bring all the customers email-id into a structured format.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose. Worked on Oozie workflow engine for job scheduling.
  • Experience in using Pentaho Data Integration tool for data integration, OLAP analysis and ETL process
  • Responsible to manage data coming from different sources.

Environment: CDH4 with Hadoop, HDFS, Pig, Cloudera, Hive, Hbase, kafka, Zookeeper, MapReduce, Java, Sqoop, Oozie, Hortonworks, Microsoft SQL Server 2008 R2, MS Windows Server 2008, Storm, Impala, ETL, CSS, Ambari, NoSQL, AWS (Amazon Web services), Linux, UNIX Shell Scripting and Big Data.

Confidential, Springfield, IL

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for running Hadoop streaming jobs to process terabytes of xml's data.
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Analyzed the existing ETL process and came up with an ETL design document that listed the jobs to load, the logic to load and the frequency of load of all the tables.
  • Analyzed, designed, developed, implemented and maintained Parallel jobs using Enterprise Edition of DataStage.
  • Developed complex jobs using variousstageslike Lookup, Join, Merge, Sort, Transformer, Dataset, Row Generator, Column Generator, Sequential File and AggregatorStages.
  • Developed multiple POCs using Pyspark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic
  • Extracted the data from Teradata into HDFS using the Sqoop.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
  • Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Involved in loading data from UNIX file system to HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Analyzed data using Hadoop components Hive and Pig.
  • Responsible for creating Hive tables, loading data and writing hive queries.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Experienced in developing applications in Hadoop Impala, Hive, Sqoop, Oozie, Java MapReduce, SparkSQL, HDFS, Pig and TEZ.
  • Exported the analyzed

Environment: Hadoop Cluster, IBM Info SphereDatastage, Qualitystage& Information Analyzer 8.0.1,HDFS, Hive, Pig, Sqoop, Linux, Hadoop MapReduce, HBase, Shell Scripting. Linux, UNIX Shell Scripting and Big Data.

Confidential

Hadoop Developer

Responsibilities:

  • Converting the existing relational database model toHadoopecosystem.
  • Generate datasets and load toHADOOPEcosystem.
  • Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
  • Continuous monitoring and managing theHadoopcluster through Cloudera Manager.
  • Involved in review of functional and non-functional requirements.
  • Implemented Frame works using Java and python to automate the ingestion flow.
  • Responsible to manage data coming from different sources.
  • Loaded the CDRs from relational DB using Sqoop and other sources toHadoopcluster by using Flume.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Creating Hive tables and working on them using HiveQL.
  • Developed data pipeline using Kafka and Storm to store data into HDFS.
  • Created reporting views in Impala using Sentry policy files.
  • Developed Hive queries to analyze the output data.
  • Had to do the Cluster co-ordination services through ZooKeeper.
  • Created a front-end application using JSPs and Spring MVC for registering a new transaction and configured it to connect to database Production.
  • Developed, enhanced and tested of the Web Methods flow services and Java services.
  • Used WebServices for interaction between various components and created SOAP envelopes.
  • Created web service XML connectors for use within a flow service
  • Developed and provided support to many components of this application from end-to-end, i.e. Front-end (View) to Web Methods and Database.
  • Provided solutions for bug fixes in this application.
  • Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity withMySQLServer.
  • Developed queries & triggers in the database
  • Used Spring Security for Authorization of users and implemented Spring WebServices.
  • Developed and configured Microsoft SQL Server 2008 tables including Sequences, Functions, Procedures and Table constraints.
  • Created standalone Java application to read data from several XLS files and insert data into the Database as needed by the Testing team.
  • Ant Build tool configuration for automation of building processes for all types of environment - Test, QA, and
  • Used Tortoise SVN as a version-controlling tool for managing the module developments.
  • Collected the logs data from web servers and stored in to HDFS using Flume.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented several Akka Actors which are responsible for loading of data into hive.
  • Design and implement Spark jobs to support distributed data processing.
  • Supported the existing MapReduce Programs those are running on the cluster.
  • Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
  • Wrote the shell scripts to monitor the health check ofHadoopdaemon services and respond accordingly to any warning or failure conditions.
  • Wrote Java code to format XML documents; upload them to Solr server for indexing.
  • Involved inHadoopcluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Developed Power enter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica.
  • Followed agile methodology for the entire project.
  • Installed and configured ApacheHadoop, Hive and Pig environment.

    Environment:Hadoop, Hortonworks, HDFS, pig, Hive, Flume, Sqoop, Ambari, Ranger, Python, Akka, Play framework, Informatica, Elastic search, Linux- Ubuntu, Solr, Java, JSP, spring, Hibernate, JavaScript, HQL Struts, Servlets, Axis, Eclipse, Ant, JDBC, WebServices, My sql, Web logic 8.1, Oracle 9i, SQL Plus.

We'd love your feedback!