Sr. Hadoop/Spark Developer Resume Lexington, KY - Hire IT People

SUMMARY:

Result - driven IT Professional with 8+ years of professional experience that includes 5 years of expertise on BigData Systems and Data Analytics, Development and Design of Java based enterprise applications.
Excellent knowledge on Hadoop ecosystem components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Scala, Flume, Kafka, Oozie and HBase .
Hands on experience in programming using Java, Scala and Python.
Sound knowledge of architecture of Distributed Systems and parallel processing frameworks .
Good experience on fine tuning spark applications to improve performance and troubleshooting failures in spark applications.
Highly skilled in designing and implementing end-to-end data pipelines to processes and analyze massive amounts of data.
Expertise on various Hadoop distributions primarily Cloudera (CDH), Hortonworks (HDP) and Amazon EMR .
Experience in developing production ready spark application using Spark RDD, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's.
Strong experience in using D-Streams for spark streaming, accumulators, broadcast variables, different levels of caching and optimization techniques for spark jobs
Proficient in importing/exporting data from RDBMS to HDFS using Sqoop.
Strong knowledge and hands on experience in developing MapReduce jobs.
Well versed with writing Hive DDL's, developing customized UDF’s in Hive.
Experience in transferring streaming data from different data sources into HDFS and HBase using Apache Kafka and Flume
Experience in using Oozie schedulers and Unix Scripting to automate end to end data workflows.
Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB.
Good understanding of Hadoop Gen1/Gen2 architecture, YARN architecture and its daemons Node manager, Resource manager and App Master and Map Reduce Programming Paradigm.
Experience in working with cloud services such as EMR, S3, EC2, Redshift, Athena.
Expert in SQL extensively worked RDBMS s like Oracle, SQL Server, DB2, MySQL and Teradata
Proficient and Worked with GIT, Jenkins and Maven .
Strong understanding of NOSQL Databases like Cassandra, MongoDB, and HBASE
Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle ( SDLC ).
Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.

TECHNICAL SKILLS:

Big Data Ecosystems: HDFS, MapReduce, YARN, Hive, Storm, Sqoop, Pig, Spark HBase, Scala, Flume, Zookeeper, Oozie

Java & J2EE Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL, hibernate 3.0, Spring 3.x, Structs

NoSQL Databases: HBase, Cassandra, MongoDB

AWS technologies: Data Pipeline, Redshift, EMR

Languages: Java, Scala, Python, SQL, Pig Latin, HiveQL, Shell Scripting.

Database: Microsoft SQL Server, MySQL, Oracle, DB2

Web/Application: Servers Web logic, Web Sphere, JBoss, Tomcat

Operating Systems: UNIX, Windows, Mac, LINUX

GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJS

Business Intelligent tools: Tableau, Splunk, QlikView

Development Methodologies: Agile, V-Model, Waterfall Model, Scrum

PROFESSIONAL EXPERIENCE:

Confidential - Lexington, KY

Sr. Hadoop/Spark Developer

Roles & Responsibilities:

Ingested click stream data from FTP servers to S3 buckets on daily basis using customized Input Adapters.
Developed Sqoop jobs to import/export data from RDBMS to S3 data store.
Developed various spark applications using Scala to perform various enrichment of these click stream data merged with user profile data.
Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting.
Trouble Shooting Spark applications for improved error tolerance.
Worked extensively on sizing the spark executors for efficient and optimal usage memory across the spark jobs.
Worked on fine tuning spark jobs to improve overall job performance.
Utilized Spark Scala API to implement batch processing of jobs.
Developed Kafka producer API to send live-stream data into various Kafka topics.
Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.
Utilized Spark in Memory capabilities, to handle large datasets.
Used B roadcast variables in Spark, effective & efficient Joins, transformations and other capabilities for data processing.
Utilized Spark-SQL to event enrichment and used Spark-SQL to prepare various levels of user behavior summaries.
Explored machine learning techniques like linear regression and clustering using Spark-ML.
Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE .
Worked extensively on AWS EMR, Athena, Glue Metastore and Redshift
Involved in continuous Integration of application using Jenkins.
Interacted with the infrastructure, network, database, application and BA teams to ensure data quality and availability

Environment: AWS EMR, Spark, Hive, HDFS, Sqoop, Kafka, Oozie, HBase, Scala, MapReduce

Confidential - Omaha, NE

Spark Developer

Roles & Responsibilities:

Worked on migrating data from traditional RDBMS to HDFS.
Ingested data into HDFS from Teradata, MySQL using Sqoop.
Involved in developing spark application to perform ETL kind of operations on the data.
Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, Data frames and Spark SQL API’s
Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables
Involved in creating Hive external tables to perform ETL on data that is produced on daily basis
Validated the data being ingested into HIVE for further filtering and cleansing.
Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations
Loaded the data into hive tables from spark and used Parquet columnar format.
Created Oozie workflows to automate and productionize the data pipelines
Migrating Map Reduce code into Spark transformations using Spark and Scala.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Worked with Tableau to connect to Impala for developing interactive dashboards.
Designed, documented operational problems by following standards and procedures using JIRA

Environment: Cloudera Hadoop, Spark, Scala, Sqoop, Oozie, Hive, Cent OS, MySQL, Oracle DB, Flume

Confidential - San Mateo, CA

Hadoop Developer

Roles & Responsibilities:

Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
Developed data pipelines using Spark, Hive and Sqoop to ingest data from data warehouse, transform and analyze operational data.
Developed Spark jobs, Hive jobs to summarize and transform data.
Worked on performance tuning of Spark application to improve performance.
Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data from web server console logs.
Worked on different file formats like Text, Sequence files, Avro, Parquet, JSON, XML files and Flat files using Map Reduce Programs.
Developed daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop .
Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
Analyzed the SQL scripts and designed the solution to implement using Scala.
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MR jobs.
Work with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve client’s operational and strategic problems.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
Extensively worked with Partitions, Dynamic Partitioning, Bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Assisted analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data.
Designed Oozie workflows for job scheduling and batch processing.

Environment: Java, Scala, Apache Spark, MySQL, CDH, IntelliJ IDEA, Hive, HDFS, YARN, Map Reduce, Sqoop, PIG, Flume, UNIX Shell Scripting, Python, Apache Kafka

Confidential - Boston, MA

Big Data/Hadoop Developer

Roles & Responsibilities:

Coordinated with business customers to gather business requirements and interacted with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
Involved in validating the aggregate table based on the rollup process documented in the data mapping.
Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS .
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra .
Aggregated data onto Oracle using sqoop for reporting on the Tableau dashboard.
Involved in application development using RDBMS, and Linux shell scripting.
Developed and updated social media analytics dashboards on regular basis.
Created a complete processing engine, based on Hortonworks distribution.
Manage and review Hadoop log files.
Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Oracle 11g, HDFS, Eclipse

Confidential - Herndon, VA

Java Developer

Roles & Responsibilities:

Gathered requirements from end users and create functional requirements.
Contribute on process flow analyzing the functional requirements
Development of Graphical user interface for user self-service screen
Implemented four eyes principle and created quality check process -reusable across all workflow on overall platform level
Development of UI models using HTML, JSP, JavaScript, Web Link and CSS .
Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
Support in end user training, testing and documentation.
Implemented Backing beans for handling UI components and stores its state in a scope.
Created Server Side of application for project management using Node JS and Mongo DB
Worked on implementing EJB Stateless sessions for communicating with Controller.
Implemented database integration using Hibernate and utilized spring with Hibernate for mapping with Oracle database.
Worked on Oracle PL/SQL queries to Select, Update and Delete data.
Worked on MAVEN for build automation. Used GIT for version control

Environment: Java, J2EE, JSP, Maven, Linux, CSS, GIT Oracle, XML, Mongo DB, Node JS, SAX, Rational Rose, UML

Confidential

Java Developer

Roles & Responsibilities:

Involved in developing the application using Java/J2EE platform. Implemented the Model View Control ( MVC ) structure using Struts.
Responsible to enhance the Portal UI using HTML, JavaScript, XML, JSP, Java, CSS as per the requirements and providing the client-side Java script validations and Server-side bean Validation Framework (JSR 303).
Used Spring Core Annotations for Dependency Injection.
Used Hibernate as persistence framework mapping the ORM objects to table using Hibernate annotations.
Responsible to write the different service classes and utility API which will be used across the frame work.
Used Axis to implementing Web Services for integration of different systems.
Developed Web services component using XML, WSDL and SOAP with DOM parser to transfer and transform data between applications.
Exposed various capabilities as Web Services using SOAP/WSDL .
Used SOAP UI for testing the Restful Webservices by sending and SOAP request.
Used AJAX framework for server communication and seamless user experience.
Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
Used client-side java scripting: JQUERY for designing TABS and DIALOGBOX.
Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
Used Log4j for the logging the output to the files.
Used JUnit/ Eclipse for the unit testing of various modules.
Involved in production support, monitoring server and error logs and foreseeing the Potential issues and escalating to the higher levels.

Environment: Java, J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, AJAX, Junit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML

We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Lexington, KY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship