We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Bloomington, IL

SUMMARY

  • Above 13 years of professional experience in Information Technology, with extensive experience as Big data Engineer and Hadoop/Java Developer, AWS, Enterprise application software Development and Maintenance in Java/J2EE technologies, data warehouse, Data/system analysis and legacy applications
  • Expert at full life cycle implementation using CDH (Cloudera).
  • Experienced with design, development and implementation of data pipeline using Spark/Scala Applications
  • Good Knowledge and experience on Integration tools like Jenkins and deployment tool like Confidential urban code in setup of CI/CD pipeline.
  • Experienced with AWS cloud platform to perform Bigdata Analytics work using EMR cluster.
  • Experienced with development and implementation of various AWS services including Aethna, Lambda, EC2, Elastic Search, Dynamo DB, Arora, Redshift, cloudwatch, S3 and Glue.
  • Developed Python Scripts to handle the complex data structures for analytics and for data ingestion into Enterprise cluster.
  • Experienced in developing and deploying enterprise - based applications using major Hadoop ecosystem components like Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce concepts and YARN architecture which includes Node manager, Resource manager, Hive, Pig, HBase, Flume, Sqoop, Spark Streaming, Spark SQL, Storm, Kafka, Oozie, Zookeeper and Cassandra.
  • Experienced in Scripting using UNIX shell script.
  • Experience in using different file formats like CSV, Sequence, AVRO, RC, ORC, JSON and PARQUET files and different compression Techniques like LZO, Gzip, Bzip2 and Snappy. Write abstract loader and reading and handling metadata
  • Experienced to implement ad-hoc queries using Hive Query Language, Partitioning, bucketing and Hive-Custom UDF's.
  • Experience in transferring Streaming data, data from different data sources into HDFS, NoSQL databases using Apache Flume and Apache Kafka.
  • Experience in creating complex data warehouse and application roadmaps and BI implementation specializing in Confidential platforms and design, development and maintenance of ETL code.
  • Analyzed large amounts of data sets and migrate ETL operations using Pig Latin scripts, operations and UDF's.
  • Expertise in implementing Service Oriented Architectures using XML based Web Service such as SOAP, UDDI and WSDL
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Hands-on programming experience in various technologies like JAVA, HTML, XML, Scala.
  • Strong Experience in developing applications using Java EE technologies includes Servlets, Struts, JSP, and JDBC.
  • Extensively development experience in different IDE like Eclipse, Net Beans, IntelliJ and STS.
  • Strong experience in core SQL and Restful web services (RWS).
  • Experienced with programming language such Core Java and JavaScript.
  • Very good experience in Scrum, Agile and Waterfall models.
  • Good experience in Tableau/Looker for Data Visualization and analysis on large data sets, drawing various conclusions.
  • Strong team player with good communication, analytical, presentation and inter-personal skills.

TECHNICAL SKILLS

Big Data Technologies: MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Kafka, Storm, Oozie, Flume, cloudera, Hortonworks, AWS

Spark components: RDD, Spark SQL (Data Frames and Dataset), Spark Streaming.

Programming Languages: Python 3, SQL, Scala 2.11, Core Java, Legacy application

Databases: Confidential, Confidential, DB2

NoSQL Databases: DynamoDB, Aurora

Dataware house tool: Redshift

Serverless Application: AWS Lambda, glue

Scripting and Query Languages: Shell scripting, SQL and PL/SQL.

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows,UNIX/Linuxand Mac OS.

Build Management Tools: Jenkins, Maven, Ant.

IDE’S & Command line tools: Eclipse, IntelliJ,Toad and WinSCP.

BI Tools: Looker, Tableau, Qlikview, Cognos

PROFESSIONAL EXPERIENCE

Confidential, Bloomington, IL

Sr. Hadoop/Spark Developer

Responsibilities:

  • Evaluate business requirements and prepare detailed specifications that follow project guidelines required to develop written programs.
  • Worked in Agile methodology, where each sprint runs for 2 weeks.
  • Ex tensively worked on Text, ORC and Parquet file formats and compression techniques like Gzip
  • Developed hive scripts to load the data from HDFS to hive tables.
  • Extensively worked on Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Designed and developed Spark-Scala models to process the data to identify the bills past 2 day due and send notification to the customers.
  • Implemented cronjob to run the shell script to transfer the data from edge node to HDFS.
  • Automated the Hadoop pipeline using oozie and scheduled using coordinator for time frequency and data availability.
  • Developed job processing scripts using Oozie workflow.
  • Worked with Spark-SQL context to create data frames to filter input data for model execution.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Experienced in writing Hive validation scripts that are used in validation framework (for daily analysis through graphs and presented to business users).
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive and then loading data into HDFS.
  • Used Impala to read, write and query the Hadoop data in HDFS
  • Used qlikview as the BI tool for research on the data.
  • Used AWS Glue to load data into S3 buckets and build RDS on top it for analytics purposes
  • Cloudwatch for monitoring the AWS jobs.

Environment: Hadoop, HDFS, HiveQL, Oozie, Impala, Cloudera (CDH 5),UNIX Shell Scripting, Spark Scala, Kafka, Udeploy, Jenkins, Hue, cronjob, AWS S3, RDS, Glue, EMR, Elastic, Cloudwatch, EMR cluster, Aethna, Redshift, Dynamo DB, Lambda, EC2, elastic search, EBS

Confidential, Bloomfield, CT

Sr. Bigdata Developer/Analyst

Responsibilities:

  • Requirement gathering involves requisite technical data which includes source files, installation, uninstallation instruction and any configuration details
  • Evaluate requests and participating in functional specification review to determine feasibility and estimate effort required.
  • Developing structured and standard packages using scala language in Spark for data processing.
  • Develop scripts in hive and impala for querying the data stored in Hadoop distributed file systems.
  • Create sqoop scripts data ingestion and ESP as scheduler,
  • Create the workflow in Oozie and have cloudera manager to monitor the health of the cluster environment.
  • Use Jenkins to deploy jar files (executables) into cluster environment.
  • Maintain the code in the GIT repository.
  • Configure the hive support to connect with the hive database via the spark/scala program
  • Check oozie log for troubleshooting in case of failure
  • Adding asset management footprint to keep track of all programs over State Farm enterprise level.
  • Document creation for each package describing the installation logic of package creation, any specific pre requirement for package installation and operating procedure which is used for further assessment
  • Supports in solving or troubleshooting any distribution issues once package goes live in production
  • Archiving the programs to separate library server using In-house tool after successful deployment of the programs to production.

Environment: Hadoop, HDFS, Kafa, HiveQL, Oozie, cronjob, Impala, Cloudera (CDH 5), Confidential, UNIX Shell Scripting, Looker, Spark Scala, IntelliJ

Confidential

Big Data - Hadoop Lead Developer

Responsibilities:

  • Involved in Installing, Configuring Hadoop components using CDH 5.2 Distribution.
  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Written MapReduce code to parse the data from various sources and storing parsed data into Hbase and Hive.
  • Worked on creating combiners, partitions, and distributed cache to improve the performance of MapReduce jobs.
  • Developed Shell Script to perform data profiling on the ingested data with the help of HIVE Bucketing.
  • Responsible for debug, optimization of Hive scripts and also implementing DE duplication logic in Hive using a rank key function (UDF).
  • Experienced in writing Hive validation scripts that are used in validation framework (for daily analysis through graphs and presented to business users).
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
  • Developed code in Java to use MapReduce framework by Hadoop streaming.
  • Used Pig as ETL tool to do transformations, joins and some pre-aggregations before storing the data into HDFS.
  • Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Confidential .
  • Experience in streaming log data using Flume and data analytics using Hive.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Extracted the data from RDBMS ( Confidential, MySQL & Confidential ) to HDFS using Sqoop.
  • Done a POC on Spark for comparing performance difference of existing MapReduce jobs written in hive and spark jobs execution times.

Environment: Hadoop, MapReduce, HDFS, Pig, HiveQL, Oozie, Flume, Impala, Cloudera, MySQL, UNIX Shell Scripting.

Confidential, Bloomfield, CT

Big Data - Hadoop Developer

Responsibilities:

  • Responsible for loading the customer’s data and event logs from Confidential database, Confidential into HDFS.
  • Involved in creating Hive tables, loading with data and writing Hive queries, which will run internally in MapReduce way.
  • Implemented different analytical algorithms using MapReduce programs to apply on top of HDFS data.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Designed a data warehouse using Hive and created Partitioned tables in Hive.
  • Load and transform large data sets of structured, semi structured and unstructured data.
  • Designed and modified database tables and used HBASE Queries to insert and fetch data from tables.
  • Worked on Hive by creating external and internal tables, loading it with data and written Hive queries.
  • Created HBase tables to store data from different sources.
  • Worked with various Hadoop file formats, including Text Files, SequenceFile, RCFile.
  • Create/Modify shell scripts for scheduling various data cleansing scripts and ETL load process.
  • Object Oriented Analysis and Design using UML include development of class diagrams, Sequence diagrams and State diagrams and implemented these diagrams in Microsoft Visio.
  • Designed and created Java Objects, JSP pages, JSF, JavaBeans and Servlets to achieve various business functionalities. Created validation methods using JavaScript and Backing Beans.
  • Involved in writing client side validations using JavaScript, CSS.
  • Involved in the design of the Referential Data Service module to interface with various databases using JDBC.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Zookeeper,AWS, Oozie,Cloudera, Scala, Confidential 11g/10g, Windows NT,UNIX Shell Scripting.

Confidential, Windsor, CT

Java/J2EE Developer

Responsibilities:

  • Involved in Full Life Cycle Development in Distributed Environment using Java and J2EE framework.
  • Designed the application by implementing Struts Framework based on MVCArchitecture.
  • Designed and developed the front end using JSP, HTML and JavaScript and JQuery.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information Apache Axis 2 Web Service.
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
  • Developed framework for data processing using Design patterns, Java, XML.
  • Used the lightweight container of the Spring Framework to provide architectural flexibility for Inversion of Controller(IOC).
  • Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
  • Designed and developed Session beans to implement the Business logic.
  • Developed EJB components that are deployed on Web logic Application Server.
  • Written unit tests using JUnit Framework and Logging is done using Log4J Framework.
  • Designed and developed various configuration files for Hibernate mappings.
  • Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
  • Developed Web Services for sending and getting data from different applications using SOAP messages.
  • Actively involved in code reviews and bug fixing.
  • Applied CSS(Cascading style Sheets) for entire site for standardization of the site.
  • Assisted QA Team in defining and implementing a defect resolution process including defect priority, and severity.

Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, WebLogic 7.0, Eclipse 3.3, Confidential 10g, JUnit 4.2,Maven, Windows XP, HTML, CSS, JavaScript, and XML.

Confidential

Java Developer

Responsibilities:

  • Assisted in designing and programming for the system, which includes development of Process Flow Diagram, Entity Relationship Diagram, Data Flow Diagram and Database Design.
  • Involved in Transactions, login and Reporting modules, and customized report generation using Controllers, Testing and debugging the whole project for proper functionality and documenting modules developed.
  • Designed front end components using JSF.
  • Involved in developing Java APIs, which communicates with the Java Beans.
  • Implemented MVC architecture using Java, Custom and JSTL tag libraries.
  • Involved in development of POJO classes and writing Hibernate query language (HQL) queries.
  • Implemented MVC architecture and DAO design pattern for maximum abstraction of the application and code reusability.
  • Created Stored Procedures using SQL/PL-SQL for data modification.
  • Used XML, XSL for Data presentation, Report generation and customer feedback documents.
  • Used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
  • Developed JUnit test cases for regression testing and integrated with ANT build.
  • Implemented Logging framework using Log4J.
  • Involved in code review and documentation review of technical artifacts.
  • Environment:J2EE/Java, JSP, Servlets, JSF, Hibernate, Spring, JavaBeans, XML, XSL, HTML, DHTML, JavaScript, CVS, JDBC, Log4J, Confidential 9i, Confidential WebSphere Application Server

Confidential

Java Developer

Responsibilities:

  • Developed Servlets and Java Server Pages (JSP).
  • Writing Pseudo-code for Stored Procedures.
  • Developed PL/SQL queries to generate reports based on client requirements.
  • Enhancement of the System according to the customer requirements.
  • Designed and Developed UI pages in CBMS application using CBMScustomframework, business objects, JDBC, JSP and javascript.
  • Involved in business requirement gatherings, development of technical design documents and design of real time eligibility project.
  • Developed Real Time Eligibility web service using CBMScustomframework, AJAX 2.0, WSDL and SOAP UI.
  • Used JAXB Marshaller and Unmarshaller to marshall and unmarshall WSDL request.
  • Developed all WSDL components, XSD, producing and consuming WSDL web services using AJAX 1.5 and AJAX 2.0.
  • Development of java services using java code, SQL queries, JDBC, Spring and hibernate entities.
  • Used to Eclipse for development, debugging and deployment of the code. Created test case scenarios for Functional Testing.
  • Used Java Script validation in JSP pages.
  • Helped design the database tables for optimal storage of data.
  • Coded JDBC calls in the Servlets to access the Confidential database tables.
  • Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.
  • Prepared final guideline document that would serve as a tutorial for the users of this application.

Environment: Java 1.4, Servlets, J2EE 1.4, JDBC, Confidential 9i, PL SQL, HTML, JSP, Eclipse, UNIX.

Confidential

Systems/Data Analyst

Responsibilities:

  • Developed report automation processes by creating a centralized data hub for multiple transactional data sources using SSIS and SQL to improve reporting capabilities and turnaround time by 35%
  • Created and implemented automation process for user credential validation of more than 350 user roles for 3 client product environments to save 40 man hours per week
  • Developed the framework and automated test cases using JavaScript which handles several functionalities and multiple inputting fields with frequent interactions to JSON files and databases
  • Recommended new methods for automating test results which resulted in eliminating the manual effort to mark test case status there by increasing efficiency by 20%
  • Restored databases, set-up an environment, and configured the application as per requirement and performed data analysis on enterprise level databases which improved accuracy by 21%
  • Aggregated data from data sources into Excel, prepared area wise reports to forecast where demand is maximum
  • Analyzed and improved the data gaps and data quality concerns to improve application performance by 20%
  • Responsible for documenting workflows and results of Business Analysis into Business Requirement Document (BRD) and obtaining sign-off from the client on specifications
  • Mentored and trained 2 recruits on JavaScript, SQL programming, existing framework and product data analysis
  • Gathered and analyzed data from different stakeholders to assist product teams on the daily execution of program

Environment: Java 1.4, Servlets, J2EE 1.4, JDBC, Confidential 9i, PL SQL, HTML, JSP, Eclipse, UNIX.

Confidential

Programmer Analyst

Responsibilities:

  • Gathered and analyzed data from different stakeholders to assist product teams on the daily execution of program
  • Solve the incident tickets and take care of end to end deployment
  • Design the modules and develop, unit test and integrate with other dependent modules
  • Interact with QA and solve any defects are opened
  • Co-ordinate with onshore on the status and keep the project on track and ensure for successful delivery.

Environment: COBOL, JCL, Confidential Utilities, Java 1.4, Endevor, SPUFI, Eclipse, JDBC, DB2, SQL

We'd love your feedback!