We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Charlotte, NC

PROFESSIONAL SUMMARY:

  • Over 8+ years of professional experience in IT industry in full Software Development Life Cycle (SDLC), AGILE Methodology and analysis, design, development, testing, implementation and maintenance in Hadoop, Data Warehousing, Linux and Java.
  • 2+years as Business Intelligence (BI) developer working on various projects which includes technical environments not only Microsoft BI suite.
  • 3+ years of experience as HadoopDeveloper and hands on experience in Hadoop Ecosystem.
  • Extensive understanding and knowledge of Hadoop Segments and Daemons like Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and HDFS Framework.
  • Experience in Data architecture including data acquisition, data ingestion, data pipelines, data analytics and analysis.
  • Capable of processing huge structured, unstructured and semi structured datasets.
  • Implemented MapReduce jobs using Java and several MapReduce mechanisms like Combiners, Partitioners, Joins, Data Compressions and Distributed cache.In depth knowledge of HDFS file system.
  • Wrote custom data types, input and output formats.
  • Exporting data to Relational Databases using SQOOP and vice versa.
  • Used flume to load weblog data into HDFS.
  • Experience in extracting source data from Sequential files, XML, JSON and Avro, Parquetfile formats and transforming and loading it into the target Data warehouse.
  • Experience in validating and cleansing the data using Pig statements and Hive Queries and various UDF’s with the help of dynamic and static partitions to meet business requirements.
  • Worked on creating Kafka instances to populate the real - time transactions or data in general to a central place (HDFS) and handling the clusters for processing.
  • Automating jobs and implement workflows using OOZIE.
  • Installing and Administering the HadoopCluster Using Hue Cloudera Manager.
  • Knowledge of NoSql and hands on experience in HBase and MongoDB.
  • Responsible for configuring Kafka clusters to stream data from multiple sources using Publish - Subscribe Method.
  • Knowledge of Sql and hands on experience in mySQl and Microsoft SQL Server.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Good Experience working with Amazon AWS for setting up Hadoop cluster.
  • Developed Scala scripts, UDF's using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experience in working at client location, Involved in MSBI projects with extensive usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Service (SSRS) and SQL Server Analysis Services (SSAS)
  • Extensively worked on Data Extraction, transforming the data by using SSIS. Proven proficiency at Data Transformations like Derived Column, Conditional Split, Aggregate, Multicast and Sort& Execute SQL Task, for each Loop, Slowly Changing Dimensions, Union All, Merge Join, Data conversion etc
  • Proficient in OLTP, OLAP and Datawarehouse Design Concepts.
  • Experience in loading the data into Data Warehouse after applying the business logic and transforming the data
  • Involved in developing Transformations, worked in multiple projects as BI Developer in Designing ETL
  • Experience in Packages and Reports in development and implementation stages of the Projects
  • Hands on experience in creation of work flows with Tasks, Containers and various Transformations in SSIS
  • Good exposure in identifying various dimensions and facts to implement data-ware house architecture
  • Experience in High Level Design of ETL DTS Packages & SSIS Packages for integrating data using OLE DB connection from heterogeneous sources like (Excel, CSV, Oracle, flat file, Text Format Data) by using multiple transformations provided by SSIS
  • Experience and understanding of SSAS, OLAP Cubes, Measure Groups, Dimensions, Calculations, Partitions, Perspectives, Aggregations and Hierarchies
  • Created different Parameterized Reports which consist of report Criteria in various reports
  • Designed Reports, Sub Reports, drill though reports using various features like Charts, graphs, filters etc
  • Assisted the testing team and facilitate the user acceptance testing (UAT)
  • Knowledge in Planning, Designing, developing and deploying Data warehouses / Data Marts with experience of both relational & multidimensional database
  • Extensively used ETL methodology for performing Data Profiling, Data Migration, Extraction, Transformation and loading using SSIS and designed data conversions from wide variety of source systems
  • Good knowledge on creating JILS in autosys and scheduling the jobs to run on desired time
  • Good team player, strong interpersonal and communication skills combined with self-motivation, initiative and the ability to think outside the box

TECHNICAL SKILLS:

Databases/BI Tools: SQLServer2005/2008R 2/2012/2014/2016 , Access Datawarehouse, SSIS,SSRS,SSAS

Web Technologies: HTML5,CSS3

Big Data: Hadoop, HDFS, MapReduce,Hive, Pig, Sqoop, Flume, Impala, HBase, Cassandra, Oozie, Zookeeper, Yarn, Kafka, AWS

Programming Languages: SQL,T: SQL,MYSQL, Java, HQL, Spark SQL, Pig Latin

IDE: Eclipse, Microsoft visual studio

Operating System: Windows vista, XP/2000: 2010, Linux, Ubuntu, Mac OSX

WORK EXPERIENCE:

Hadoop Developer

Confidential, Charlotte, NC

Responsibilities:

  • Used different performance optimization techniques on large datasets like Partitions, bucketing, Effective & efficient Joins, Transformations
  • Created Hive, HBase tables and Hive integrated HBase tables as per the design using ORC file format and Snappy compression.
  • Imported required tables from RDBMS to HDFS using Sqoop and used Storm to get real time streaming of data into HBase.
  • Wrote different pig scripts to clean up the ingested data and created partitions for the daily data on Hive tables.
  • Wrote different UDF's to convert the date format and to create hash value using MD5 Algorithm in Java and used various UDF from Piggybanks and other sources.
  • Writing Oozie workflows to run multiple Hive, shell scripts and Pig jobs which run independently with time and data availability.
  • Established custom Map Reduces programs in order to analyze data and used Pig Latin to cleanup unwanted data.
  • Involved in writing Hive, Pig scripts for complex transformations and Implemented Hive/Pig custom UDF's to achieve comprehensive data analysis.
  • Developed Spark programs, scripts and UDF’s using Scala/Spark SQL for aggregative operations as per the requirement.
  • Used Spark DataFrame API to perform analytics on hive data and implemented various checkpoints on RDD's to disk to handle job failures and debugging.
  • Used Apache Kafka to push huge amounts of data to Cassandra.
  • Resolved production issues regarding performance and memory by using tuning parameters.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis and worked with Spark accumulators and broadcast variables
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS
  • Experienced in working with Amazon Web Services (AWS) EC2 and S3 in Spark RDD

Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive,HBase, Phoenix, Pig, Sqoop, Spark, Oozie, Solr, SQL, Java (jdk 1.6),Scala AWS

Java/Hadoop Developer

Confidential

Responsibilities:

  • Attending daily meeting with Customer to find out the exact requirement and providing the technical solution to meet the customer requirement.
  • Gather and analyze business and technical requirements.
  • Loaded data into Hadoop environment integrating with Informatica.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.
  • Implemented Sparkusing Scala and SparkSQL for faster testing and processing of data.
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Import the data from different sources like HDFS/HBase into SparkRDD.
  • Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in Oozie.
  • Performed performance tuning for SparkStreaming like setting right Batch interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
  • Performance optimization dealing with large datasets using Partitions during ingestion process itself.
  • Used SparkAPI over HadoopYARN to perform analytics on data in Hive.
  • Developed Map Reduce jobs to convert data files into Parquet file format.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Developed business specific Custom UDF's in Hive, Pig.

Environment: Informatica, MapReduce, EDW, Hive Queries, Oozie, Pig, Spark, Scala, Spark SQL, Tableau, ODBC, HDFS, Partitions, Yarn.

Hadoop Developer

Confidential

Responsibilities:

  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Involved in creating Hive tables, loading and analyzing data using hive queries which will internally run MapReduce jobs.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Involved in running Hadoopjobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoopupdates, patches, version upgrades as required.
  • Involved in loading data from LINUX file system to HDFS.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Responsible for managing data from multiple sources.
  • Extracted files from MySQL/DB2 through Sqoop and placed in HDFS and processed.
  • Experienced in managing and reviewing Hadooplog files.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Assisted in exporting analyzed data to relational databases using Sqoop.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, DB2, Oozie, MySQL, Linux, Ubuntu.

Java Developer

Confidential

Responsibilities:

  • Participated in the requirement analysis and design of the application using UML/Rational Rose and Agile methodology.
  • Involved in developed the application using CoreJava,J2EEand JSP's.
  • Developed Web based application entitled EMR inJ2EEframework which uses Hibernate for persistence, Spring for Dependency Injection and Junit for testing.
  • Used JSP to develop the front-end screens of the application.
  • Designed and developed several SQL Scripts, Stored Procedures, Packages and Triggers for the Database.
  • Used Indexing techniques in the database procedures to obtain search results.
  • Involved in development of Web Service client to get client details from third party agencies.
  • Developed nightly batch jobs which involved interfacing with external third party state agencies.
  • Test scripts for performance and accessibility testing of the application are developed.
  • Responsible for deploying the application in client UAT environment.
  • Involved in different types of testing like Unit, System, Integration testing etc. is carried out during the testing phase.

Environment: Java,J2EE, Struts Frame work, JSP, Spring Framework, Hibernate, Oracle, Eclipse, Subversion, Oracle, PL/SQL, WebSphere UML, Windows.

BI Developer

Confidential

Responsibilities:

  • Migrated DTS Packages(SQL2000) to SQL Server Integration (SQL 2005)
  • Loading data into dimension, fact and aggregate tables by creating SSIS packages.
  • Involved in developing Transformations like Aggregate, Conditional Split, Lookup, Execute SQL Task, For each Loop, Slowly Changing Dimensions, Union All, Merge Join, Data conversion etc.
  • Creation of packages by using required transformations in SSIS.
  • Creation of Tabular, Chart, Parameterized reports using SSRS.
  • Designed Reports, Sub Reports, drill though reports using various features like Charts, graphs, filters etc.
  • Understanding the traditional database of the client, and understanding the architecture of the target Data warehouse.
  • Developed OLAP Cubes by using SQL Server Analysis Services (SSAS).
  • Designing the cubes for reporting in SSAS as per the requirement along with calculated members and required hierarchies which also includes MDX queries.

Environment: SSIS, SSRS, SSAS, SQL Server 2005, DTS

BI Developer

Confidential

Responsibilities:

  • Involved in building Data Marts and OLAP Cubes.
  • Analyze the source map documents and creating packages to move the data from source to destination.
  • Creating packages for implementing ETL by using various control flow items for each loop, for loop and data Flow transformations execute SQL, conditional split, multicast, derived column, lookup, sort, union all etc. Did error handling while moving the data.
  • Scheduled the ETL Package (Monthly) Using SQL Server 2005 Management Studio.
  • Develop the package for ETL process
  • Involved in creation of the package and was responsible for moving the package from development server to production server.
  • Generated multiple Enterprise reports using SSRS from SQL Server Database (OLTP) and SQL Server Analysis Services Database (OLAP) and included various reporting features such as group by, drilldowns, drill through, sub-reports, navigation reports etc.
  • Extensively involved in the SSAS storage and partitions, and aggregations, calculation of queries with MDX, developing reports using MDX and SQL.

Environment: SSIS, SSRS, SQL Server, SSAS

We'd love your feedback!