Hadoop/spark Developer Resume
Atlanta, GA
PROFESSIONAL SUMMARY:
- 7 years of professional experience in information technology, which includes 4 years of experience in the development of Bigdata and Hadoop ecosystem components.
- Over 3 years of extensive experience in JAVA, J2EE Technologies, Database development and Data analytics.
- Hands on experience in development of Big Data projects using Hadoop, Hive, Sqoop, Oozie, PIG, Flume, and MapReduce open source tools/technologies.
- Experience in writing Pig Latin, HiveQL scripts and extended their functionality using User
- Hands on experience with performance optimization techniques for data processing in Hive,
- Written complex Map - Reduce code by implementing custom writable and writable comparable
- Had a very good exposure working with various File-Formats (Parquet, Avro & JSON) and
- Hands on experience with Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API.
- Developed applications using Spark for data processing.
- Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and
- Capable at using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.
- Good knowledge on Spark architecture and real-time streaming using Spark.
- Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections,
- Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in Java, JSP, Servlets, Web Logic, Web Sphere, Java Script, Ajax, JQuery, XML, and
- Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
- Knowledge on ETL methods for data extraction, transformation and loading in corporate-wide ETL solutions and Data warehouse tools for reporting and data analysis.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Well-Versed with Agile and Waterfall methodologies.
- Strong team player with good communication, analytical, presentation and interpersonal skills.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Cassandra, Mongo DB Zookeeper, Hive, Pig, Sqoop, Flume and Oozie.
Operating Systems: Windows, UNIX, LINUX.
Programming Languages: C, Java, PL/SQL, Scala
Scripting Languages: JavaScript, Shell Scripting
Web Technologies: HTML, XHTML, XML, CSS, JavaScript, JSON, SOAP, WSDL.
Hadoop Distribution: Cloudera, Hartonworks.
Java/J2EE Technologies: Java, J2EE, JDBC.
Database: Oracle, MS Access, MySQL, SQL, No SQL.
IDE: Eclipse, IntellIj, SBT.
Methodologies: J2EE Design patterns, Scrum, Agile, Water Flow
Version Control: SVN, Git, GitHub, BITBUCKET
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Hadoop/Spark Developer
Responsibilities:
- Using Spark API imported data into HDFS from Teradata and created Hive tables.
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression
- Involved in running all the hive scripts through hive, Impala, Hive on Spark, and some through
- Involved in performance tuning of Hive from design, storage, and query perspectives.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Worked with Spark-SQL context to create data frames to filter input data for model execution.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time,
- Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Integrated Hive and Tableau Desktop reports and published to Tableau Server.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie
- Used Jira for bug tracking, Bit Bucket to check-in, and checkout code changes.
Environment: HDFS, Yarn, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Cloudera.
Confidential, Minneapolis, MA
Hadoop Developer/Spark Developer
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
- Converted complex Teradata and Netezza SQLs into Hive HQLs.
- Developed ETL using Hive, Oozie, shell scripts and Sqoop. Used Scala for coding the components, & Utilized Scala pattern matching in coding.
- Used Flume to collect, aggregate and store the web log data onto HDFS.
- Developed and implemented core API services using Python with Hive.
- Designed NoSQL schemas in HBase.
- Developed MapReduce ETL in Java and Pig.
- Load log data into HDFS using Flume.
- Developed simple to complex MapReduce Jobs using Hive and Pig.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Environment: Map Reduce, HDFS, Hive, Pig, Sqoop, Scala, Oozie, SQL, Flume, Python, Shell Script, DataStage, Horton works, Cloudera.
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Loaded the data using Sqoop from different RDBMS Servers like Teradata and Netezza to Hadoop HDFS Cluster.
- Performed Sqoop Incremental imports by using Oozie based on every day.
- Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce pattern.
- Performed Optimizations of Hive Queries using Map side joins, dynamic partitions, and Bucketing.
- Responsible for executing Hive queries using Hive Command Line under Tez.
- Implemented Hive Generic UDF’s to implement business logic around custom data types.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Coordinated the Pig and Hive scripts using Oozie workflow.
- Loaded the data into HBase from HDFS.
- Load and transform large sets of structured, semi structured, and unstructured data that includes Avro, sequence files, and XML files.
Environment: Hadoop, Hortonworks, Big Data, HDFS, MapReduce, Tez, Sqoop, Oozie, Pig, HiveLinux, Java, Eclipse.
Confidential
Hadoop Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design, and development.
- Analyze large datasets to provide strategic direction to the company.
- Collected the logs from the physical machines and integrated into HDFS using Flume.
- Involved in analyzing the system and business.
- Developed SQL statements to improve back-end communications.
- Loaded unstructured data intoHadoopFile System (HDFS).
- Created reports and dashboards using structured and unstructured data.
- Involved in importing data from MySQL to HDFS using SQOOP.
- Involved in writing Hive queries to load and process data inHadoopFile System.
- Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce way.
- Involved in working with Impala for data retrieval process.
- Sentiment Analysis on reviews of the products on the client's website.
- Developed custom Map Reduce programs to extract the required data from the logs.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewingHadooplog files.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in Full Life Cycle Development in Distributed Environment using Java and J2EE Framework.
- Designed the application by implementing Struts Framework based on MVC Architecture.
- Designed and developed the front end using JSP, HTML and JavaScript and JQuery.
- Implemented the Web Service client for the login authentication, credit reports and applicant information Apache Axis 2 Web Service.
- Extensively worked on User Interface for few modules using JSPs, JavaScript, and Ajax.
- Developed framework for data processing using Design patterns, Java, XML.
- Used the lightweight container of the Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
- Designed and developed Session beans to implement the Business logic.
- Developed EJB components that are deployed on Web Logic Application Server.
- Written unit tests using JUnit Framework and Logging is done using Log4J Framework.
- Designed and developed various configuration files for Hibernate mappings.
- Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
- Developed Web Services for sending and getting data from different applications using SOAP messages.
- Actively involved in code reviews and bug fixing.
- Applied CSS (Cascading style Sheets) for entire site for standardization of the site.
- Assisted QA Team in defining and implementing a defect resolution process including defect priority, and severity.
Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse 3.3, Oracle 10g, JUnit 4.2, Maven, Windows XP, HTML, CSS, JavaScript, and XML.
Confidential
Programmer Analyst
Responsibilities:
- Involved in understanding the functional specifications of the project.
- Assisted the development team in designing the complete application architecture
- Involved in developing JSP pages for the web tier and validating the client data using JavaScript.
- Developed connection components using JDBC.
- Designed Screens using HTML and images.
- Cascading Style Sheet (CSS) was used to maintain uniform look across different pages.
- Involved in creating Unit Test plans and executing the same.
- Did the documents/code reviews and knowledge transfer for the status updates of the ongoing project developments
- Deployed web modules in Tomcat web server.
Environment: Java, JSP, J2EE, Servlets, Java Beans, HTML, JavaScript, JDeveloper, Tomcat Webserver, Oracle, JDBC, XML.