We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Lisle, IL

PROFESSIONAL SUMMARY:

  • Extensive IT experience of over 7 + years with multinational clients which includes 6 years of Big data related architecture experience developing Spark / Hadoop applications.
  • Hands on experience with the Hadoop stack (MapReduce, Pig, Hive, Sqoop, HBase, Flume, Oozie).
  • Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig .
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Python .
  • Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
  • Experienced with performing real time analytics on NoSQL data bases like HBase and Cassandra .
  • Good knowledge in working with Impala, Storm and Kafka.
  • Experienced with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Worked with Oozie work flow engine to schedule time based jobs to perform multiple actions.
  • Experienced in importing and exporting data from RDBMS into HDFS using Sqoop.
  • Analyzed large amounts of data sets using Pig scripts and Hive scripts.
  • Good working knowledge on processing Batch applications.
  • Experienced in writing MapReduce programs and UDFs for both Hive and Pig in Java.
  • Used Flume to channel data from different sources to HDFS.
  • Experienced with Maven, Jenkins and GIT.
  • Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
  • Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming the data using flume interceptors.
  • Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Knowledge in handling Kafka cluster and created several topologies to support real - time processing requirements.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe like JSON and Avro.
  • Experience in using different file formats - Avro, Sequence Files, ORC, JSON, Parquet
  • Experience in Performance Tuning, Optimization and Customization.
  • Experience with Unit Testing Map Reduce programs using MRUnit, JUnit.
  • Experience in Active Development as well as onsite coordination activities in web based, client/server and distributed architecture using Java, J2EE which includes Web services, Spring, Struts, Hibernate and JSP/Servlets along with incorporating MVC architecture.
  • Developing SSRS reports on SQL Server, Good understanding of SSAS, OLAP cube and Architecture.
  • As part of my assignments, I have been working on projects for leading clients, which includes understanding client Requirements, Estimations, Analysis of Functional specifications, Technical Design Specification, Review of Technical Design, Development, Testing and Implementation activities.
  • Good working knowledge on servers like Tomcat, Web Logic 8.0.
  • Extensively worked on Java development tools, which includes Eclipse Galileo 3.5, Eclipse Helios 3.6, Eclipse Mars 4.5, WSAD 5.1.2.
  • Ability to work in teams as well as an individual, quick learner and able to meet deadlines.

TECHNICAL SKILLS:

Hadoop Ecosystems: Hadoop, MapReduce, YARN, Spark, Sqoop, Hive, Oozie, Pyspark, PIG, HDFS, Flume, Impala, Storm, Apace Kafka, Solar, HBase.

Java Technologies: Java 5, Java 6, JAXP, AJAX, I18N, JFC Swing, Log4j, Java Help API.

J2EE Technologies: JSP 2.1 Servlets 2.3, JDBC 2.0, JNDI, XML, JAXP, Java Beans.

Frameworks: MVC, Struts, Hibernate.

Application Server: Apache Tomcat 5.x 6.0, Jboss 4.0.

SDLC Methodologies: Waterfall, Agile/Scrum.

MS Office Suite: MS Word, MS Excel, MS PowerPoint, MS Access, MS Project, MS Visio.

Database Management: SPSS, MS Excel, MS Access, Oracle 10g, DB2, Mongo DB.

Operating Systems: Windows 95/98/2000/XP/Vista/7/8/10, Linux, Macintosh.

Programming Languages: C, Java 6.0, Scala 2.11 SQL, PL/SQL, PIG Latin, HiveQL, Unix shell scripting

Reporting Tools: Tableau, Power BI, SSRS

WORK EXPERIENCE:

Hadoop Developer

Confidential, Lisle, IL

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Evaluated Spark's performance vs Impala on transactional data. Used Spark transformations and aggregations using Python and Scala to perform min, max and average on transactional data.
  • Experienced in migrating data from HiveQL to SparkSQL using pyspark.
  • Experienced in handling Hive queries using Spark SQL that integrate with Spark environment.
  • Used java to develop Restful API for database Utility Project. Responsible for performing extensive data validation using Hive.
  • Designed a data model in Cassandra (POC) for storing server performance data.
  • Implemented a Data service as a rest API project to retrieve server utilization data from this Cassandra Table.
  • Implemented Python script to call the Cassandra Rest API, performed transformations and loaded the data into Hive.
  • Designed data model to ingest transactional data with and without URI’s into Cassandra.
  • Implemented shell script to call python script to perform min, max and average on utilization data of 1000s hosts and compared the performance on various levels of summarization.
  • Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
  • Generated reports from this hive table for visualization purpose.
  • Migrated HiveQL to SparkSQL to validate Spark's performance with Hive's. Implemented Proof of concept for Dynamo DB, Redshift and EMR.
  • Proactively researched on Microsoft Azure. Presented Demo on Microsoft Azure, an overview of cloud computing with Azure.
  • Enhanced the Pyspark code to replace spark with Impala.
  • Performed installation for Impala on the Edge node.
  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.
  • Import data from different data source to create Power BI reports and dashboards.
  • Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server. Creating New Schedule's and checking the task's daily on the server.
  • Working on Dimensional Modeling, ER Modeling, Data Marts, Star Schema / Snowflake Schema , FACT and Dimensional tables in SSAS and Atscale .
  • Implemented complex DAX calculations for SSAS Tabular model.
  • Developed attractive visuals/dashboards to convey the story inside the data. Also developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
  • Used Power BI Custom visuals like Power KPI, Chiclet Slicer, Infographic Designer, Spline Chart to make the visuals look and understand better.

Environment: Hadoop, Azure, AWS, HDFS, Hive, Hue, Oozie, Java, Linux, Cassandra, Python, Open TSDB, Scala, Impala, Microsoft Azure, pyspark.

Hadoop Developer

Confidential, Chicago, IL

Responsibilities:

  • Worked with team members for upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Written elastic search template for the index patterns.
  • Implemented and extracted the data from Hive using Spark.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs, and Python.
  • Extracted files from NoSQL database (MongoDB) and processed them with Spark using mongo Spark connector.
  • Involved in creating Hive tables, and loading and analysing data using hive queries.
  • Written Hive queries on the analysed data for aggregation and reporting.
  • Involved in creating Hive tables and loading with data.
  • Imported and exported data from different databases into HDFS and Hive using Sqoop.
  • Used Sqoop for loading existing metadata in Oracle to HDFS.
  • Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Used BI Tool Tableau and Power BI for the generating of dashboard reports and visualization of data.
  • Implemented Map Reduce jobs using Java API and PIG Latin.
  • Participated in the setup and deployment of Hadoop cluster.
  • Wrote Python code using happy base library of Python to connect to HBASE and use the HAWQ querying as well.
  • Involved in loading data from UNIX file system and FTP to HDFS .
  • Hands on design and development of an application using Hive (UDF).
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Written client web applications using SOAP web services.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Install Hadoop, Map Reduce, HDFS, and AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.

Environment: MapReduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark SQL, Apache Kafka, Flume, Zookeeper, Oozie, Yarn, Linux, Sqoop, Java, Scala, Tableau, Python, SOAP, REST, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra.

Hadoop developer

Confidential, Chicago, IL

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on importing data from various sources and performed transformations using MapReduce, HIVE to load data into HDFS.
  • Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and MapReduce.
  • Solved small file problem using Sequence files processing in Map Reduce.
  • Written various Hive and Pig scripts.
  • Created HBase tables to store variable data formats coming from different portfolios.
  • Performed real time analytics on HBase using Java API and Rest API.
  • Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
  • Worked on compression mechanisms to optimize MapReduce Jobs.
  • Analyzed the customer behavior by performing click stream analysis and to ingest the data used Flume.
  • Experienced with working on Avro Data files using Avro Serialization system.
  • Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Horton works, Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java, Eclipse.

Hadoop developer

Confidential, Irvine, CA

Responsibilities:

  • Exported the analyzed data to the relational databases using Sqoop and process the data for visualization and to generate reports for the BI team.
  • Stored data from HDFS to respective Hive tables for business analysts to conduct further analysis in identifying data trends.
  • Developed Hive ad-hoc queries and filtered data in order to increase the effectiveness of the process execution by using functions like Joins, Group By, and Having.
  • Built a data flow pipeline using flume, Java (MapReduce) and Pig.
  • Increased the time efficiency of the Hive QL using partitioning of data and reduced the time difference of executing the sets of data by applying the compression techniques like SNAPPY for Map-Reduce Jobs.
  • Created Hive Partitions for storing data for different trends under different partitions
  • Connected the Hive tables to data analysis tools like Tableau for graphical representation of the trends
  • Worked on Oozie workflow engine for job scheduling
  • Designed the Solr Schema, and used the Solr client API for storing, indexing, querying the schema fields.
  • Loading the data to HBASE by using bulk load and HBASE API .

Environment: Hive, SQL, Pig, Flume, Map reduce, SQOOP, Java, Shell Scripting, Teradata, Oracle, Oozie, Cassandra

Java Developer

Confidential

Responsibilities:

  • Involved in code reviews and mentored the team in resolving issues.
  • Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
  • Designed and developed user interface using JSP, HTML and JavaScript.
  • Developed struts action classes, action forms and performed action mapping using Struts Framework and performed data validation in form beans and action classes.
  • Involved in multi-tiered J2EE design utilizing MVC architecture (Struts Framework) and Hibernate.
  • Extensively used Struts Framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
  • Involved in system design and development in core java using Collections, multithreading.
  • Defined the search criteria and pulled out the record of the customer from the database. Make the required changes to the record and save the updated information back to the database.
  • Wrote JavaScript validations to validate the fields of the user registration screen and login screen.
  • Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
  • Used DAO and JDBC for database access.
  • Developed applications with ANT based build scripts.
  • Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.
  • Design and develop XML processing components for dynamic menus on the application.
  • Involved in postproduction support and maintenance of the application.

Environment: Oracle 11g, Java 1.5, Struts 1.2, Servlets, HTML, XML, MS MS SQL Server 2005, J2EE,JUnit,Tomcat 6.

We'd love your feedback!