Hadoop/Spark Developer Resume DE - Hire IT People

PROFESSIONAL SUMMARY:

Around 8+ years of professional experience in Software development with 5+years of experience in Bigdata technologies including Hadoop and Spark.
Professional Java developer with strong expertise in data engineering and big data technologies.
Extensively worked on Spark, Hive, Pig, MapReduce, Sqoop, Kafka, Oozie, HBase, Impala and Yarn .
Hands on experience in programming using Java, Python, Scala and SQL.
Sound knowledge of architecture of Distributed Systems and parallel processing frameworks.
Designed and implemented end - to-end data pipelines to processes and analyze massive amounts of data.
Experienced working with Hadoop distributions both on-prem (CDH,HDP) and in cloud (AWS).
Good experience working with various data analytics and big data services in AWS Cloud like EMR, Redshift, S3, Athena, Glue etc.,
Experienced in developing production ready spark application using Spark RDD Apis, Data frames, Spark-SQL and Spark-Streaming API's.
Worked extensively on fine tuning spark applications to improve performance and troubleshooting failures in spark applications.
Strong experience in using Spark Streaming, Spark Sql and other components of spark like accumulators, Broadcast variables, different levels of caching and optimization techniques for spark jobs
Proficient in importing/exporting data from RDBMS to HDFS using Sqoop.
Used hive extensively to performing various data analytics required by business teams.
Solid experience in working various data formats like Parquet, Orc, Avro, Json etc.,
Experience automating end-to-end data pipelines with strong resilience and recoverability.
Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB.
Extensively used various IDE’s like IntelliJ, NetBeans and Eclipse
Expert in SQL, extensively worked RDBMSs like Oracle, SQL Server, DB2, MySQL and Teradata
Worked with Apache Nifi to ingest the data into HDFS from variety of sources
Proficient and Worked with GIT, Jenkins and Maven.
Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle (SDLC).
Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.

AREAS OF EXPERTISE:

Big Data Ecosystems: HDFS, MapReduce, YARN, Hive, Sqoop, Pig, Spark HBase, Oozie.

Programming Languages: Java, Scala, Python, SQL.

AWS technologies: S3, EMR,Redshift, Athena, Glue

Database: SQL Server, MySQL, Oracle, DB2, Teradata.

No Sql Databases: HBase, MongoDB, Cassandra.

IDE’s & Utilities: Eclipse, IntelliJ.

Development Methodologies: Agile, Waterfall Model.

PROFESSIONAL EXPERIENCE:

Confidential, DE

Hadoop/Spark Developer

Responsibilities:

Ingested gigabytes of click stream data from external servers such as FTP server and S3 buckets on daily basis using custom Input Adapters.
Created Sqoop scripts to import/export user profile data from RDBMS to S3 data lake.
Developed various spark applications using Scala to perform various enrichments of user behavioral data (click stream data) merged with user profile data.
Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for down stream model learning and reporting.
Utilized Spark Scala API to implement batch processing of jobs
Trouble Shooting Spark applications for improved error tolerance.
Fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines
Created Kafka producer API to send live-stream data into various Kafka topics.
Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.
Utilized Spark in Memory capabilities, to handle large datasets.
Used B roadcast variables in Spark, effective & efficient Joins, transformations and other capabilities for data processing.
Experienced in working with EMR cluster and S3 in AWS cloud.
Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Involved in continuous Integration of application using Jenkins.
Interacted with the infrastructure, network, database, application and BA teams to ensure data quality and availability

Environment: AWS EMR, Spark, Hive, HDFS, Sqoop, Kafka, Oozie, Hbase, Scala, MapReduce.

Confidential, NJ

Spark Developer

Responsibilities:

Extensively worked on migrating data from traditional RDBMS to HDFS.
Ingested data into HDFS from Teradata, Mysql using Sqoop.
Involved in developing spark application to perform ELT kind of operations on the data.
Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, Dataframes and Spark SQL API’s
Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables
Involved in creating Hive external tables to perform ETL on data that is produced on daily basis
Validated the data being ingested into HIVE for further filtering and cleansing.
Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations
Loaded data into hive tables from spark and used Parquet columnar format.
Created Oozie workflows to automate and productionize the data pipelines
Migrating Map Reduce code into Spark transformations using Spark and Scala.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Designed, documented operational problems by following standards and procedures using JIRA

Environment: HDP (Hortonworks Data Platform), Spark, Scala, Sqoop, Oozie, Hive, Cent OS, MySQL, Oracle DB, Flume

Confidential, Az

Hadoop Developer

Responsibilities:

Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyze operational data.
Developed Spark jobs, Hive jobs to summarize and transform data.
Worked on performance tuning of Spark application to improve performance.
Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data from web server console logs.
Worked on different file formats like Text, Sequence files, Avro, Parquet, JSON, XML files and Flat files using Map Reduce Programs.
Developed daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop.
Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
Analyzed the SQL scripts and designed the solution to implement using Scala.
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MR jobs.
Work with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve clients operational and strategic problems.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
Extensively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Assisted analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data.
Designing Oozie workflows for job scheduling and batch processing.

Environment: Java, Scala, Apache Spark, MySQL, CDH, IntelliJ IDEA, Hive, HDFS, YARN, Map Reduce, Sqoop, PIG, Flume, Unix Shell Scripting, Python, Apache Kafka.

Confidential, MA

Big Data (Hadoop Developer)

Responsibilities:

Coordinated with business customers to gather business requirements. And, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
Involved in validating the aggregate table based on the rollup process documented in the data mapping.
Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra.
Hands on experience in application development using RDBMS, and Linux shell scripting.
Developed and updated social media analytics dashboards on regular basis.
Create a complete processing engine, based on Hortonworks distribution.
Manage and review Hadoop log files.
Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Oracle 11g, Scala, HDFS, Eclipse.

Confidential, MD

Java Developer

Responsibilities:

Gathering requirements from end users and create functional requirements.
Contribute on process flow analyzing the functional requirements
Development of Graphical user interface for user self-service screen
Implemented four eyes principle and created quality check process -reusable across all workflow on overall platform level
Development of UI models using HTML, JSP, JavaScript, Web Link and CSS.
Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
Support in end user training, testing and documentation.
Implemented Backing beans for handling UI components and stores its state in a scope.
Worked on implementing EJB Stateless sessions for communicating with Controller.
Implemented database integration using Hibernate and utilized spring with Hibernate for mapping with Oracle database.
Worked on Oracle PL/SQL queries to Select, Update and Delete data.
Worked on MAVEN for build automation. Used GIT for version control

Environment: Java, J2EE, JSP, Maven, Linux, CSS, GIT Oracle, XML, SAX, Rational Rose, UML.

Confidential

Java Developer

Responsibilities:

Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Structs.
Responsible to enhance the Portal UI using HTML, JavaScript, XML, JSP, Java, CSS as per the requirements and providing the client-side Java script validations and Server-side bean Validation Framework (JSR 303).
Used Spring Core Annotations for Dependency Injection.
Used Hibernate as persistence framework mapping the ORM objects to table using Hibernate annotations.
Responsible to write the different service classes and utility API which will be used across the framework.
Used Axis to implementing Web Services for integration of different systems.
Developed Web services component using XML, WSDL and SOAP with DOM parser to transfer and transform data between applications.
Exposed various capabilities as Web Services using SOAP/WSDL.
Used SOAP UI for testing the Restful Webservices by sending and SOAP request.
Used AJAX framework for server communication and seamless user experience.
Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
Used client-side java scripting: JQUERY for designing TABS and DIALOGBOX.
Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
Used Log4j for the logging the output to the files.
Used JUnit/ Eclipse for the unit testing of various modules.
Involved in production support, monitoring server and error logs and foreseeing the Potential issues and escalating to the higher levels.

Environment: Java, J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, AJAX, Junit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML.

We provide IT Staff Augmentation Services!

Hadoop/spark Veloper Resume

DE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship