Spark And Hadoop Developer Resume
HoustoN
SUMMARY:
- A seasoned technocrat with 8 years of experience in various areas and 4 years of experience in Big Data ecosystem which includes Spark, Hadoop.
- Proven experience on Web log analysis of data on Apache Spark using Scala and analyzed the jobs on Spark UI.
- Hands - on expertise on handling RDDs and Spark SQL in Spark using Scala.
- Worked on Spark streaming of live data by optimally defining the data into blocks using the batch and sliding window intervals.
- Involved in development of Map reduce programs in Hadoop and Spark environments.
- Have good knowledge on Hadoop eco systems - HDFS, Hive, Pig, Sqoop and Oozie.
- Adept at HDFS architecture concepts which is a NoSQL attribute.
- Used Apache Kafka to get the data from Kafka producer which in turn pushes data to broker.
- Worked as an Oracle PL/SQL Developer responsible for Analysis and Design of Business Applications using the Oracle RDBMS.
- Worked as a Java/J2EE developer and gained expertise in Object oriented programming concepts.
- Good knowledge of key Oracle performance related features such as Query Optimizer, Explain Plans and Indexes.
- Good knowledge in python and scala programming concepts.
- Good knowledge in AWS.
- Good knowledge on Google Cloud.
- Worked on EC2 and S3 for small data sets.
- Experience in handling and base lining raw requirements and transforming them into Business requirement and Functional specification documents.
TECHNICAL SKILLS:
Big data Ecosystems: Hadoop, Apache Spark, Kafka, HDFS, Hive, Pig and Oozie.
Programming Languages: PL/SQL, C, Java/J2EE, Scala, Python Toad, SQL developer, Microsoft Visio, Putty, Filezilla, IBM Rational Clear-case, Tomcat Web-server, Jenkins, Git
Web Technologies: HTML
Development Tools: SQL Developer, Eclipse
Testing Methodologies: Waterfall Model, Agile Model
Database: Oracle, NoSQL - HDFS, Mongo DB
PROFESSIONAL EXPERIENCE:
Confidential, Houston
Spark and Hadoop Developer
Responsibilities:
- Worked on Web log analysis of data on Spark using Scala and analyzed the jobs on Spark UI.
- Involved in writing map-reduce programs in Scala from eclipse and terminal.
- Worked on handling live stream data by optimally defining the data into blocks through batch and sliding window intervals.
- Involved in Scala programming using key features of functional and object oriented programming languages. Good at using features like map, filter, reduceByKey and groupByKey.
- Developed Map Reduce programs in Hadoop environment and stored the refined data in staging tables.
- Involved in performance tuning of SQL queries in Spark SQL.
- Developed Kafka-spark streaming jobs to read real time streaming messages from Kafka topics and produce them to Solace topics and write the data onto HDFS with zero data loss.
- Write complex Hive queries to extract data from heterogeneous sources (Data Lake)
- Processing Parquet files from Data Lake using Apache Spark and Scala
- Created reports using Sqoop to export the data into HDFS and Hive.
- Analyze Client Systems and Gather requirements from Business.
- Delegating the understanding of the requirements to development and QA teams and triggering the kick-off meeting.
Confidential, Chicago
Hadoop Developer
Responsibilities:
- Develop Spark/Scala jobs to create and load the files into NoSQL DB Apache HBASE which is consumed by the real-time applications
- Create data pipeline to implement business solutions and generate different files that can be fed to down Stream systems and NoSQL database
- Used Oozie to automate data loading into HDFS and PIG to process the data.
- Created HBase tables to load structured, semi-structured and Un-structured data.
- Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
- Developed Map-Reduce programs to clean and aggregate the data.
- Responsible for building scalable distributed data solutions using Hadoop and Spark
- Installed Ozzie workflow engine to run multiple MapReduce, Hive, Impala, Zookeeper and Pig jobs which run independently with time and data availability
- Implemented Hive Ad-hoc queries to handle Member data from different data sources such as Epic and Centricity.
Confidential, Indianapolis
Hadoop developer
Responsibilities:
- Extensively worked on creating Hive external and internal tables and then applied HiveQL to aggregate the data.
- Implemented partitioning and bucketing in Hive for efficient data access.
- Developed customized UDF's in Java where the functionality is too complex.
- Integrated the Hive warehouse with HBase.
- Used Sqoop to import data from RDMS to HDFS and later analyzed data using various Hadoop components.
- Worked on MapReduce jobs to standardize the data and clean it to calculate aggregations.
- Developed Pig Latin scripts to sort, group, join and filter enterprise data.
- Used Sqoop to do incremental import/export data from RDBMS to HDFS
- Used Oozie as a job scheduler to run multiple Hive and Pig jobs.
- Created reports using Sqoop to export the data into HDFS and Hive.
Confidential, Charlotte, NC
Data Analyst
Responsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
- Prepared High Level Logical Data Models and BRDs (Business Requirement Documents), supporting documents.
- Part of team conducting logical data analysis and data modeling sessions, communicated data-related standards. Conducted Design discussions to come out with the appropriate data Mart.
- Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using ERWIN.
- Designed different type of STAR schemas like detailed data marts and Plan data marts, Monthly Summary data marts for POS application with various Dimensions Like Time, Services, Customers, and various FACT Tables.
- Worked on integrating the data between different sources.
- Delegating the understanding of the requirements to development and QA teams and triggering the kick-off meeting.
Confidential, Brussels
Pl./SQL Developer
Responsibilities:
- Created records, tables, collections (nested tables and arrays) for improving Query performance by reducing context switching.
- Identify and execute automation test cases and work towards optimal usage of automation environment using VB.
- Experience in interaction with the client, gathering requirements, giving estimates for developing projects.
- Generating the output files by executing the maestro jobs on daily basis and sending those files to other applications by FTP.
- Developing Unix scripts to execute the batches based on requirements.
- Developing Stored Procedures in PL/SQL based on requirement and integrating the same with Java Code block for execution.
- Identifying upstream and downstream applications and publishing the schedule for testing and development in integrated environment
Confidential, New York
Java/J2EE developer
Responsibilities:
- Involvement in development of modules using JSP and JavaScript.
- Developing and executing Junit test cases for the modules.
- Experience in database design using PL/SQL to write Stored Procedures, Functions and Triggers.
- Attending requirement analysis calls, raising clarifications and tracking them to closure.
- Leading the migration of monthly statements from Unix systems to MVC architecture based web systems.
- Perform Sanity testing in Quality assurance environment to check for code break before deployment.
- Actively participating in production deployment phase.
- Tracking the health of the project to closure right from requirement analysis phase and taking the ownership of signing off the project to production.