Hadoop/Spark Developer Resume CA - Hire IT People

PROFESSIONAL SUMMARY:

Around 8+ years of professional experience in IT with 5+ years of experience in Bigdata technologies including Hadoop and Spark
Extensively worked on Spark, Hive, PIG, MapReduce, Sqoop, Kafka, Oozie, HBase, Yarn and Hue.
Hands on experience in programming using Java, Python, Scala and SQL.
Sound knowledge of architecture of Distributed Systems and parallel processing frameworks.
Designed and implemented end - to-end data pipelines to processes and analyze massive amounts of data.
Experienced in working with various Hadoop distributions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
Experienced in developing production ready spark application using Spark RDD Apis, Data frames, Spark-SQL and Spark-Streaming API's.
Worked extensively on fine tuning spark applications to improve performance and trouble shooting failures in spark applications.
Strong experience in using D-Streams for spark streaming, accumulators, Broadcast variables, different levels of caching and optimization techniques for spark jobs
Proficient in importing/exporting data from RDBMS to HDFS using Sqoop.
Strong knowledge and hands on experience in developing MapReduce jobs.
Developed Ad-hoc queries to load and transform data in Hive.
Well versed with writing Hive DDL's, developing customized UDF’s in Hive.
Designed and implemented Hive and Pig UDFs using Java for evaluation, filtering, loading and storing of data.
Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
Experience in transferring Streaming data from different data sources into HDFS and HBase using Apache Kafka and Flume
Experienced in using Oozie schedulers and Unix Scripting to implement Cron jobs that execute different kind of Hadoop actions
Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB.
Extensively used various IDE’s like IntelliJ, NetBeans and Eclipse
Experienced in working with cloud services such as EMR, S3, EC2, Redshift,Athena.
Expert in SQL, extensively worked RDBMSs like Oracle, SQL Server, DB2, MySQL and Teradata
Worked with Apache Nifi to ingest the data into HDFS from variety of sources
Good understanding of Hadoop Gen1/Gen2 architecture, YARN architecture and its daemons Node manager, Resource manager and App Master and Map Reduce Programming Paradigm
Proficient and Worked with GIT, Jenkins and Maven.
Developed core modules in large cross-platform applications using JAVA, JSP, JDBC, JavaScript, XML, and HTML.
Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle (SDLC).
Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.

TECHNICAL SKILLS:

Big Data Ecosystems: HDFS, MapReduce, YARN, Hive, Storm, Sqoop, Pig, Spark HBase, Scala, Flume, Zookeeper, OozieNO SQL

Databases: HBase, Cassandra, MongoDB

Java & J2EE Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL, Hibernate 3.0, Spring 3.x, Structs

AWS technologies: Data Pipeline, Redshift, EMR

Languages: Java, Scala, Python, SQL, Pig Latin, HiveQL, Shell Scripting.

Database: Microsoft SQL Server, MySQL, Oracle, DB2

Web/Application Servers: Web logic, Web Sphere, JBoss, Tomcat

IDE s & Utilities: Eclipse, IntelliJ, NetBeans

Operating Systems: UNIX, Windows, Mac, LINUX

GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJS

Business Intelligent tools: Tableau, Splunk, Qlik View

Agile, V: Model, Waterfall Model, Scrum

PROFESSIONAL EXPERIENCE:

Confidential, CA

Hadoop/Spark Developer

Responsibilities:

Ingested gigabytes of click stream data from external servers such as FTP server and S3 buckets on daily basis using customized home grown Input Adapters.
Created Sqoop scripts to import/export data from RDBMS to S3 data store.
Developed various spark applications using Scala to perform various enrichment of these click stream data merged with user profile data.
Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting.
Automated the data flow between the systems and managed flow of information using Apache Nifi.
Utilized Spark Scala API to implement batch processing of jobs
Trouble Shooting Spark applications for improved error tolerance.
Fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines
Created Kafka producer API to send live-stream data into various Kafka topics.
Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.
Utilized Spark in Memory capabilities, to handle large datasets.
Used B roadcast variables in Spark, effective & efficient Joins, transformations and other capabilities for data processing.
Utilized Spark-SQL to event enrichment and also used Spark-sql to prepare various levels of user behavior summaries
Experienced in working with EMR cluster and S3 in AWS cloud.
Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Also Used AWS service Redshift to generate the business report as per requirement
Involved in continuous Integration of application using Jenkins.
Interacted with the infrastructure, network, database, application and BA teams to ensure data quality and availability

Environment: AWS EMR, Spark, Hive, HDFS, Sqoop, Kafka, Oozie, Hbase, Scala, MapReduce

Confidential, MO

Spark Developer

Responsibilities:

Extensively worked on migrating data from traditional RDBMS to HDFS.
Ingested data into HDFS from Teradata, Mysql using Sqoop.
Involved in developing spark application to perform ETL kind of operations on the data.
Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, Dataframes and Spark SQL API’s
Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables
Involved in creating Hive external tables to perform ETL on data that is produced on daily basis
Validated the data being ingested into HIVE for further filtering and cleansing.
Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations
Loaded the data into hive tables from spark and used Parquet columnar format.
Created Oozie workflows to automate and productionize the data pipelines
Migrating Map Reduce code into Spark transformations using Spark and Scala.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Worked with Tableau to connect to Impala for developing interactive dashboards.
Designed, documented operational problems by following standards and procedures using JIRA

Environment: Cloudera Hadoop, Spark, Scala, Sqoop, Oozie, Hive, Cent OS, MySQL, Oracle DB, Flume

Confidential, Az

Hadoop Developer

Responsibilities:

Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyze operational data.
Developed Spark jobs, Hive jobs to summarize and transform data.
Worked on performance tuning of Spark application to improve performance.
Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data from web server console logs.
Worked on different file formats like Text, Sequence files, Avro, Parquet, JSON, XML files and Flat files using Map Reduce Programs.
Developed daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop.
Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
Analyzed the SQL scripts and designed the solution to implement using Scala.
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MR jobs.
Work with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve clients operational and strategic problems.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
Extensively worked with Partitions, Dynamic Partitioning, Bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Assisted analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data.
Designing Oozie workflows for job scheduling and batch processing.

Environment: Java, Scala, Apache Spark, MySQL, CDH, IntelliJ IDEA, Hive, HDFS, YARN, Map Reduce, Sqoop, PIG, Flume, Unix Shell Scripting, Python, Apache Kafka.

Confidential

Big Data/Hadoop Developer

Responsibilities:

Coordinated with business customers to gather business requirements. And, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
Involved in validating the aggregate table based on the rollup process documented in the data mapping.
Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra.
Hands on experience in application development using RDBMS, and Linux shell scripting.
Developed and updated social media analytics dashboards on regular basis.
Create a complete processing engine, based on Hortonworks distribution.
Manage and review Hadoop log files.
Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Oracle 11g, scala, HDFS, Eclipse

Confidential, IL

Java Developer

Responsibilities:

Gathering requirements from end users and create functional requirements.
Contribute on process flow analyzing the functional requirements
Development of Graphical user interface for user self-service screen
Implemented four eyes principle and created quality check process -reusable across all workflow on overall platform level
Development of UI models using HTML, JSP, JavaScript, Web Link and CSS.
Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
Support in end user training, testing and documentation.
Implemented Backing beans for handling UI components and stores its state in a scope.
Worked on implementing EJB Stateless sessions for communicating with Controller.
Implemented database integration using Hibernate and utilized spring with Hibernate for mapping with Oracle database.
Worked on Oracle PL/SQL queries to Select, Update and Delete data.
Worked on MAVEN for build automation. Used GIT for version control

Environment: Java, J2EE, JSP, Maven, Linux, CSS, GIT Oracle, XML, SAX, Rational Rose, UML.

Confidential

Java Developer

Responsibilities:

Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Structs.
Responsible to enhance the Portal UI using HTML, JavaScript, XML, JSP, Java, CSS as per the requirements and providing the client-side Java script validations and Server-side bean Validation Framework (JSR 303).
Used Spring Core Annotations for Dependency Injection.
Used Hibernate as persistence framework mapping the ORM objects to table using Hibernate annotations.
Responsible to write the different service classes and utility API which will be used across the frame work.
Used Axis to implementing Web Services for integration of different systems.
Developed Web services component using XML, WSDL and SOAP with DOM parser to transfer and transform data between applications.
Exposed various capabilities as Web Services using SOAP/WSDL.
Used SOAP UI for testing the Restful Webservices by sending and SOAP request.
Used AJAX framework for server communication and seamless user experience.
Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
Used client-side java scripting: JQUERY for designing TABS and DIALOGBOX.
Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
Used Log4j for the logging the output to the files.
Used JUnit/ Eclipse for the unit testing of various modules.
Involved in production support, monitoring server and error logs and foreseeing the Potential issues and escalating to the higher levels.

Environment: Java, J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, AJAX, Junit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship