We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Over 4 years of IT experience including Big Data technologies, Web Application development and Business Intelligence.
  • Experience in deploying and managing a multi - node cloudera Hadoop cluster with different components (MFS, NFS, CLDB, Web server, Spark, Resource Manager, Node Manager, Hive, HBase, Zookeeper, History Server) using manual install.
  • Experience working with Cloudera & Hortonworks Distribution of Hadoop.
  • Exposure to Spark, Spark Streaming, Scala and Implementing Spark using Scala and utilizing Data frames and Spark SQL API, Data Frames and Pair RDD's for faster processing of data.
  • Experienced with the Scala, Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
  • Hands on experience in writing Python scripts.
  • Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and Performed map-side joins on RDD's.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into Hdfs.
  • Good knowledge in job workflow scheduling and monitoring tools like Oozie, Zookeeper.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experience with distributed systems, large-scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
  • Experienced in loading data to hive partitions and creating buckets in Hive.
  • Experience in handling messaging services using Apache Kafka.
  • Experience in fine-tuning MapReduce jobs for better scalability and performance.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
  • Highly motivated team player with zeal to learn new technologies.
  • Experience in all Phases of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies

TECHNICAL SKILLS:

Hadoop/MapR Ecosystem: HBase, MaprDB (binary and document), MFS, HDFS, MapReduce, YARN, Spark, MapR Control System (MCS), Sqoop, Hive, Pig, Cloudera Manager, Zookeeper

Tool: (s): Apache Tomcat 7.0, Maven, JIRA, Git, Hibernate, Microsoft SQL Server Management Studio, Oracle SQL Developer, MySQL Workbench, Eclipse

Language(s): Java 7/8, Scala, C#, C/C++, JavaScript, ABAP/4, Ruby, HTML5, CSS3 Database(s): Oracle 10g/11g/12c, MySQL, Microsoft SQL Server 2005/2008 Framework(s): Spring MVC, Spring Boot, Hibernate, ASP.NET MVC

PROFESSIONAL EXPERIENCE

Hadoop/spark Developer

Confidential, Chicago, IL

Responsibilities:

  • Extracted the data from the flat files and other RDBMS databases into staging area and ingested to Hadoop.
  • Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Performed masking on customer sensitive data using Flume interceptors.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: HDFS, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Maven, Git, HDP Distribution, Eclipse, Log4j, JUnit, Linux.

Hadoop Developer

Confidential, Basking ridge, NJ

Responsibilities:

  • Installed and configured Hadoop, YARN, MapReduce, Flume, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in Python for data cleaning.
  • Developed data pipeline using Flume, Sqoop, Pig and Python MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Extensive experience in working with various distributions of Hadoop Enterprise versions of Cloudera good knowledge on Amazon's EMR (Elastic MapReduce)
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop
  • Experience in deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
  • Developed Python scripts to extract the data from the web server output files to load into HDFS.
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Wrote Python MapReduce scripts for processing the unstructured data.
  • Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Responsible for creating Hive tables, loading data and writing hive queries.
  • Used forward engineering to create a Physical Data Model with DDL that best suits the requirements.
  • Worked with Sqoop to export analyzed data from HDFS environment into RDBMS for report generation and visualization purpose.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.

Environment: Cloudera, Cloudera Manager, HDFS, Map Reduce, Hive, Impala, Pig Latin, Python, SQL, Sqoop, Flume, Yarn, Linux, Centos, HBase.

Confidential

Java developer

Responsibilities:

  • Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
  • Responsible for developing and modifying the existing service layer based on the business requirements.
  • Involved in designing & developing web-services using SOAP and WSDL.
  • Involved in database design.
  • Created tables, views, triggers, stored procedures in SQL for data manipulation and retrieval
  • Developed Web Services for Payment Transaction and Payment Release.
  • Involved in Requirement Analysis, Development and Documentation.
  • Developed front-end using JSP, HTML, CSS and JavaScript.
  • Coding for DAO Objects using JDBC (using DAO pattern).
  • XML and XSDs are used to define data formats.
  • Implemented J2EE design patterns such as singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
  • Involved in Bug fixing and functionality enhancements.
  • Followed coding and documentation standards and best practices.
  • Participated in project planning discussions and worked with team members to analyze the requirements and translate them into working software modules.

Environment: Java, J2EE, JSP, SOAP, WSDL, SQL, PL/SQL, XML, JDBC, Eclipse, Windows XP, Oracle

We'd love your feedback!