Hadoop/spark Developer Resume
Chicago, IL
SUMMARY:
- Over 4 years of IT experience including Big Data technologies, Web Application development and Business Intelligence.
- Experience in deploying and managing a multi - node cloudera Hadoop cluster with different components (MFS, NFS, CLDB, Web server, Spark, Resource Manager, Node Manager, Hive, HBase, Zookeeper, History Server) using manual install.
- Experience working with Cloudera & Hortonworks Distribution of Hadoop.
- Exposure to Spark, Spark Streaming, Scala and Implementing Spark using Scala and utilizing Data frames and Spark SQL API, Data Frames and Pair RDD's for faster processing of data.
- Experienced with the Scala, Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
- Hands on experience in writing Python scripts.
- Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and Performed map-side joins on RDD's.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into Hdfs.
- Good knowledge in job workflow scheduling and monitoring tools like Oozie, Zookeeper.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Experience with distributed systems, large-scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
- Experienced in loading data to hive partitions and creating buckets in Hive.
- Experience in handling messaging services using Apache Kafka.
- Experience in fine-tuning MapReduce jobs for better scalability and performance.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Highly motivated team player with zeal to learn new technologies.
- Experience in all Phases of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies
TECHNICAL SKILLS:
Hadoop/MapR Ecosystem: HBase, MaprDB (binary and document), MFS, HDFS, MapReduce, YARN, Spark, MapR Control System (MCS), Sqoop, Hive, Pig, Cloudera Manager, Zookeeper
Tool: (s): Apache Tomcat 7.0, Maven, JIRA, Git, Hibernate, Microsoft SQL Server Management Studio, Oracle SQL Developer, MySQL Workbench, Eclipse
Language(s): Java 7/8, Scala, C#, C/C++, JavaScript, ABAP/4, Ruby, HTML5, CSS3 Database(s): Oracle 10g/11g/12c, MySQL, Microsoft SQL Server 2005/2008 Framework(s): Spring MVC, Spring Boot, Hibernate, ASP.NET MVC
PROFESSIONAL EXPERIENCE
Hadoop/spark Developer
Confidential, Chicago, IL
Responsibilities:
- Extracted the data from the flat files and other RDBMS databases into staging area and ingested to Hadoop.
- Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
- Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
- Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Performed masking on customer sensitive data using Flume interceptors.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: HDFS, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Maven, Git, HDP Distribution, Eclipse, Log4j, JUnit, Linux.
Hadoop Developer
Confidential, Basking ridge, NJ
Responsibilities:
- Installed and configured Hadoop, YARN, MapReduce, Flume, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in Python for data cleaning.
- Developed data pipeline using Flume, Sqoop, Pig and Python MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Extensive experience in working with various distributions of Hadoop Enterprise versions of Cloudera good knowledge on Amazon's EMR (Elastic MapReduce)
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop
- Experience in deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
- Developed Python scripts to extract the data from the web server output files to load into HDFS.
- Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Wrote Python MapReduce scripts for processing the unstructured data.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Responsible for creating Hive tables, loading data and writing hive queries.
- Used forward engineering to create a Physical Data Model with DDL that best suits the requirements.
- Worked with Sqoop to export analyzed data from HDFS environment into RDBMS for report generation and visualization purpose.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Environment: Cloudera, Cloudera Manager, HDFS, Map Reduce, Hive, Impala, Pig Latin, Python, SQL, Sqoop, Flume, Yarn, Linux, Centos, HBase.
Confidential
Java developer
Responsibilities:
- Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
- Responsible for developing and modifying the existing service layer based on the business requirements.
- Involved in designing & developing web-services using SOAP and WSDL.
- Involved in database design.
- Created tables, views, triggers, stored procedures in SQL for data manipulation and retrieval
- Developed Web Services for Payment Transaction and Payment Release.
- Involved in Requirement Analysis, Development and Documentation.
- Developed front-end using JSP, HTML, CSS and JavaScript.
- Coding for DAO Objects using JDBC (using DAO pattern).
- XML and XSDs are used to define data formats.
- Implemented J2EE design patterns such as singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
- Involved in Bug fixing and functionality enhancements.
- Followed coding and documentation standards and best practices.
- Participated in project planning discussions and worked with team members to analyze the requirements and translate them into working software modules.
Environment: Java, J2EE, JSP, SOAP, WSDL, SQL, PL/SQL, XML, JDBC, Eclipse, Windows XP, Oracle