Spark And Hadoop Developer Resume
Brentwood, Tn
PROFESSIONAL SUMMARY:
- Overall 8 years of experience in Software Development Life Cycle including Requirements Gathering, Documenting, Analysis, Development, Testing and Support. Over 4 years of extensive experience as Hadoop Developer and Big Data Analyst. Primary technical skills in HDFS, Scala, Spark, MapReduce, YARN, Hive, Sqoop, HBase, Flume, Oozie, Zookeeper.
- Good Experience in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Working experience with Big Data and Hadoop File System (HDFS). In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name node, Data node and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyses large data sets efficiently.
- Good understanding of kafka architecture and experienced in writing spark streaming jobs in kafka.
- Developed KAFKA PRODUCER and CONSUMERS, HBase clients, SPARK and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
- Developed a data pipeline using Kafka, HBase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data.
- Integrating Spark, kafka and HBase to power real time dashboard.
- Excellent ability to use analytical tools to mine data and evaluate the underlying patterns.
- Hands - on experience in developing MapReduce programs using Apache Hadoop for analyzing the Big Data.Worked with Apache spark for quick analytics on object relationships.
- Hands on knowledge on RDD and Data Frame transformations in spark.
- Involved in the process of load, transform and analyze Transactions data from various providers into Hadoop on an ongoing basis.
- Experienced with processing different file formats like Avro, Parquet, CSV, JSON and Sequence file formats using MapReduce programs, spark.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in the process of load, transform and analyze Transactions data from various providers into Hadoop on an ongoing base.
TECHNICAL SKILLS:
Programming Languages: Java, Scala, C/C++, PL/SQL, Shell
Hadoop Ecosystem: Spark, HDFS, Map-Reduce, Hive, HBase, Kafka, Zookeeper, Sqoop, Flume, Oozie, Yarn, SOLR
Development Tools: Eclipse, Maven, DBvisualizer, Putty, Git, sbt
Databases: MySQL, Oracle 11g, HBase, MongoDB, NoSQL (Cassandra)
Web Development: HTML5, CSS3, JavaScript, jQuery, Bootstrap
Frameworks: Spring, jUnit, log4j
PROFESSIONAL EXPERIENCE:
Spark and Hadoop Developer
Confidential, Brentwood, TN
Responsibilities:
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers.
- Developed data pipeline using Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Collected the JSON data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Involved in HBASE setup and storing data into HBASE, which will be used for analysis.
- Used Jira for bug tracking and Bitbucket to check-in and checkout code changes.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: Scala, HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.
Hadoop Developer
Confidential, Crestwood, IL
Responsibilities:
- Developed Spark applications to perform all the data transformations on User behavioral data coming from multiple sources.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Responsible for managing data coming from different sources.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Performed Filesystem management and monitoring on Hadoop log files.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Developed spark code using Scala and spark-SQL for faster testing and data processing.
- Performed masking on customer sensitive data using Flume interceptors.
- Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked on large sets of structured, semi-structured and unstructured data.
Environment: Apache Hadoop, HDFS, MapReduce, Hive, HBase, Sqoop, Oozie, Maven, Shell Scripting, Spark, Scala, Cloudera Manager.
Hadoop Developer
Confidential, Wheatfield, IN
Responsibilities:
- Setup Hadoop cluster on Amazon EC2.
- Analyzing Hadoop cluster and different big data tools including Pig, Hbase and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked on installing cluster commissioning decommissioning of datanode, namenode recovery capacity planning and slots configuration.
- Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs
- Involved in loading data from UNIX file system to HDFS.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Got good experience with NOSQL database SOLR, HBase.
Environment: Apache Hadoop, HDFS, Hive, Flume, Hbase, Sqoop, PIG, Java, Eclipse, MySQL, Zookeeper, Amazon EC2, SOLR.
Hadoop Developer/Admin
Confidential, West Hartford, CT
Responsibilities:
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on RHEL. Assisted with performance tuning and monitoring.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Reviewed Hadoop Log files.
Environment: Apache Hadoop, HDFS, MapReduce, Hive, HBase, Sqoop, Maven, Shell Scripting, CDH3.
Hadoop Developer
Confidential, Stratford, CT
Responsibilities:
- Involved in gathering and analyzing business requirements, and designing Hadoop Stack as per the requirements.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Performed masking on customer sensitive data using Flume interceptors.
- Developed Custom Input Formats in MapReduce jobs to handle custom file formats and to convert them into key-value pairs.
- Experience in handling data in different file formats like Text, Sequence, Avro and RC File
- Wrote MapReduce jobs for data processing and the result is stored in HBase for BI reporting.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau
- Experience in development of Pig Latin, Hive QL and other Hadoop ecosystem tools for trend analysis and pattern recognition on user data.
- Developed and executed shell scripts to automate the jobs.
Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, Impala, Flume, HBase, Pig, Java, SQL, CDH, UNIX, Shell Scripting.
Java/J2EE Developer
Confidential, Dubuque, IA
Responsibilities:
- Involvement in all phases of the Software Development Life Cycle (SDLC).
- Involved in the team discussions regarding the modeling, architectural and performance issues.
- Using the UML methodology, developed Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the dynamic view of the system developed in Visual Paradigm.
- Followed agile methodology and involved in daily SCRUM meetings, sprint planning, showcases and retrospective.
- Understand the business requirement of the project and coding in accordance with the technical design document.
- Prepare High level design document as well as test cases for unit testing of project.
- Fix the bugs/defects raised during System Testing & User Acceptance Testing.
- In production support work, time factor plays an important role. Handled critical call logs in less time.
- Providing project induction training to the freshers on the project.
- Deftly coordinate with on-site for timely delivery of project & query resolutions
- Worked very closely with the Transaction Team who is responsible for creating visual layouts of the screen.
Environment: Java 1.2/1.3, Applet, Servlet, JSP, custom tags, JDBC, XML, HTML, CSS, JavaScript, Oracle, DB2, PL/SQL, JUnit, Log4J, RDBMS.
