Sr. Hadoop/Spark Developer Resume Philadelphia, PA - Hire IT People

PROFILE SUMMARY:

Having 8+ years of Experience in IT industry in Designing, Developing and Maintaining Web based Applications using BigData Technologies like Hadoop and Spark Ecosystems and Java/J2EE Technologies.
Excellent understanding of Hadoop Architecture and Daemons such as HDFS, Name Node, DataNode, Job Tracker, Task Tracker and Map Reduce Concepts.
Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, Sqoop, HBase, Impala, Solr,Elastic Search,Oozie, Zoo Keeper, Kafka,Spark, Cassandra with Cloudera and Hortonworks distribution.
Hands on experience in various big data application phases like data ingestion, data analytics and data visualization.
Experienced in writing MapReduce programs in Java to process large data sets using Map andReduce Tasks.
In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
Expertise in writing SparkRDDtransformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core.
Experience in using DStreams, Accumulator variables, Broadcast variables, RDD caching for SparkStreaming.
Expertise in developing Real-Time Streaming Solutions using Spark Streaming.
Expertise in using Spark-SQL with various data sources like JSON, Parquet and Hive.
Hands on Experience in working with Spark MLlib.
Experienced in Developing Spark programs using Scala and Java API’s.
Expertise in using Kafka as a messaging system to implement real-time Streaming solutions.
Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
Expertise in using Flume in Collecting, aggregating and loading log data from multiple sources into HDFS.
Scheduled various ETL process and Hive scripts by developing Oozie workflows.
Expertise in using Custom loader functions in PiggyBank.
Expertise in Implementing complex business logic by writing Generic UDF's and Hive UDF.
Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
Experience in designing both time driven and data driven automated workflows using Oozie.
Experience in handling various file formats like AVRO, Sequential, Parquet etc.
Proficient in Various NoSQL Databases like Cassandra, MongoDB, Hbase etc.
Good understanding of MPP databases such Impala and Created tables and writing Queries in IMPALA and GreenPlum.
Experienced in Developing and maintaining Greenplumregions, indexes, disk stores, DataPartitioning and Replication.
Experience in migrating ETL process into Hadoop, Designing Hive data model and wrote PigLatinscripts to load data into Hadoop.
Experienced in using Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
Good Knowledge on Cloudera distributions and in Amazon simple storage service(Amazon S3), AWSand Amazon EC2, Amazon EMR.
Worked on HBase to perform real time analytics and experienced in CQL to extract data from Cassandra tables.
Experience in working with different join patterns and implemented both Map side joins and Reduce Side Joins.
Implementing E2E solutions in big data using Hadoop framework.
Experienced with Kerberos authentication to provide more security to the cluster.
Experienced with Cloudera Manager to monitor health and performance of the Hadoop cluster.
Experienced in writing Test cases and perform unit testing using testing frame works like Junit, Easy mock and Mockito.
Strong Knowledge in Informatica ETL Tool, Data warehousing and Business intelligence. knowledge on Web/Application Servers like Apache Tomcat, IBM WebSphere and Oracle WebLogic.
Good Knowledge on Object Oriented Analysis and Design(OOAD) and Java Design patterns.
Good level of experience in Core Java, JEE technologies as JDBC, Servlets, and JSP.
Hands-on knowledge on core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization.
Expert in developing web applications using Struts,Hibernate and Spring Frameworks.
Hands on Experience in writing SQL and PL/SQL queries.
Ability to work with Onsite and Offshore Teams.
Good understanding and experience with Software Development methodologies like Agile and Waterfall.
Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box, Black-box.

TECHNICAL SKILLS:

BigData Ecosystem: HDFS, Map Reduce, YARN, Hive,Pig,Sqoop, ZooKeeper, Oozie, Flume,KAFKA,Apache Spark,Spark-Streaming, Spark-SQL, Apache SOLR, Impala, Elastic Search.

Hadoop Distributions: Cloudera, Hortonworks and MapR.

Programming Languages: Java,Scala,Python.

Java Frameworks: Hibernate, Struts2, Spring

Scripting Languages: JavaScript, XML, HTML

Web Services: RESTful web services

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

NoSQL Databases: HBase, MongoDB, Cassandra

DBMS Languages: SQL, PL/SQL, MYSQL, T-SQL

IDE: Eclipse, NetBeans, IntelliJ

Operating Systems: Windows variants, Unix, Linux.

Web Servers: Apache Tomcat, Web Sphere, Web logic

ETL Tools: Informatica, Pentaho

Methodologies: Waterfall, Agile, TDD

PROFESSIONAL EXPERIENCE:

Confidential, Philadelphia, PA

Sr. Hadoop/Spark Developer

Responsibilities:

Developed Pig scripts to help perform analytics on JSON and XML data.
Created Hive tables (external, internal) with static and dynamic partitions and performed bucketing on the tables to provide efficiency.
Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting.
Performed data transformations by writing MapReduce and Pig jobs as per business requirements.
Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for analysis.
Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
Extracting real time data using Kafka and spark streaming by Creating DStreams and converting them into RDD, processing it and stored it into Cassandra.
Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Casandra tables for quick searching, sorting and grouping.
Imported data from RDBMS systems like MySQL into HDFS using Sqoop.
Developed Sqoop jobs to perform incremental imports into Hive tables.
Involved in loading and transforming of large sets of structured and semi structured data.
Created Data Pipelines as per the business requirements and scheduled it using OozieCoordinators.
Involved in analyzing log data to predict the errors by using Apache Spark.
Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
Used Impala and Written Queries for fetching Data from Hive tables.
Developed Several MapReduce jobs using Java API.
Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
Involved in analyzing log data to predict the errors by using Apache Spark.
Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
Developed Pig and Hive UDF's to implement business logic for processing the data as per requirements.
Developed Pig UDFsin Java and used UDFs from PiggyBank for sorting and preparing the data.
Developed Oozie Bundles to schedule pig, Sqoop and hive jobs to create data pipelines.
Implemented the project by using Agile Methodology and AttendedScrum Meetings daily.

Environment: Hadoop, Hive,HDFS,Pig, Sqoop, Oozie, Spark, Spark-Streaming, KAFKA, Apache Solr,Cassandra, Cloudera Distribution, Java, Impala, Web Server’s, Maven Build,MySQL,AWS,Agile-Scrum.

Confidential, San Francisco, CA

BigData Engineer

Responsibilities:

Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
Implemented Custom interceptors to Mask confidential data and filter unwanted records from the event payload in flume.
Implemented Custom Serializes to perform encryption using DES algorithm.
Developed Collectionsin Mongo DB and performed aggregations on the collections.
Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.
Developed Spark Programs using Scala and Java API'sand performed transformations andactions on RDD's.
Created HBase tables and used Hbase sinks and loaded data into them to perform analytics using Tableau.
Imported data from AWS S3 and into spark RDD and performed transformations and actions on RDD's.
Used the JSON and XMLSerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
Developed PIG Latin scripts for the analysis of semi structured data and conducted data Analysis by running Hive queries and Pig Scripts.
Used codec's like snappy and LZO to store data into HDFS to improve performance.
Expert knowledge on MongoDBNoSQL data modeling, tuning, disaster recovery and backup.
Created HBase tables to store variable data formats of data coming from different Legacy systems.
Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Developed Sqoop Jobs to load data from RDBMS into HDFS and HIVE.
Developed Oozie coordinatorsto schedule Pig and Hive scripts to create Data pipelines.
Worked on Kerberos authentication to establish a more secure network communication on the cluster.
Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
Worked with Network, database, application and BI teams to ensure data quality and availability.
Implemented Elastic Search on Hive data warehouse platform.
Worked with ELASTIC MAPREDUCE and setup environment in AWS EC2 Instances.
Experience in Maintaining the cluster on AWS EMR.
Experienced in NOSQL databases like HBase, MongoDBand experienced with Hortonworks distribution of Hadoop.
Developed ETL jobs to integrate data from various sources and load into the warehouse using Informatica 9.1
Scheduled the ETL jobs using ESP scheduler.

Environment: Hadoop, HDFS,MapReduce, Hive, Pig, Sqoop, Hbase,MongoDB,Flume,Apache Spark, Accumulo, Oozie, Kerberos, AWS, Tableau, Java, Informatica, Elastic Search, Git, Maven.

Confidential, Fort Lauderdale, FL

Hadoop Developer

Responsibilities:

Handled large amount of data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Worked on Data importing and exporting into HDFS and Hive Using Sqoop.
Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
Migrated large amount of data from various Databases like Oracle, Netezza, MySQL to Hadoop.
Responsible to Create Hive Tables, Load data into them and to write Hive queries.
Performing Data transformations in HIVE. written Hive queries to perform Data Analysis as per the Business Requirements.
Created partitions and buckets on hive tables to improve performance while running Hive queries.
Optimizing and performance tuning of Hive Queries.
Implementing Complex transformations by writing UDF's in PIG and HIVE.
Loading and Transforming all kinds of data like Structured, semi-structured, and Unstructured data.
Ingesting Log data from various web servers into HDFS using Apache Flume.
Implemented Flume Agents for loading Streaming data into HDFS.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Written several Map reduce Jobs using Java API.
Scheduled jobs using Oozie workflow Engine.
Worked on various compression techniques like GZIP and LZO.
Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
Implemented HBase on top of HDFS to perform real time analytics.
Handled Avro Data files using Avro Tools and Map Reduce.
Developed Data pipelines by using Chained Mappers.
Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML,CSV etc.
Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
Active involvement in Scrum meetings and Followed Agile Methodology for implementation.

Environment: HDFS, Map Reduce, Hive, Flume, Pig, Sqoop,Oozie,HBase, RDBMS/DB, Flat files, MySQL, CSV, Avro data files.

Confidential

Java/J2EE Developer

Responsibilities:

Implemented several design patterns like Observer pattern, factory pattern, singleton pattern, facade pattern etc.
Utilized Agile Methodologies to manage full life-cycle development of the project.
Interacting with the Business Analyst and Host to understating the requirements using the Agile methodologies and SCRUM meeting to keep track and optimizing the end client needs.
Created Use Case Diagrams, Class Diagrams, Activity Diagrams during the design phase.
Developed the project using MVC Design pattern.
Designed and Developed Server-side Components (DAO, Session Beans) Using J2EE.
Worked with Core Java concepts like Collections Framework, multithreading, memory management.
Used JDBC connectivity and JDBC statements, Prepared Statements, Callable Statements for querying, inserting, updating, deleting data from Oracle databases.
Developed Front-end Screens using HTML, CSS, and JavaScript.
Developed Date Time Picker using Object Oriented JavaScript extensively.
Code reviews and re-factoring was done during the development and check list is strictly adhered during development.
Used JENKINS for continuous Integration. used Subversion as a version control system for the application.
Used Log4j for logging purposes and Tracing the code.
Client side Validations are done using JavaScript.
Optimized XML parsers like SAXand DOM for the production data.
Have good understanding of Teradata MPP architecture such as Partitioning and Primary Indexes.
Good knowledge in Teradata Unity, Teradata Data Mover, OS PDE Kernel internals, Backup and Recovery.
Implemented the JMS Topic to receive the input in the form of XML and parsed them through a common XSD.
Used JDBC Connections and WebSphere Connection pool for database access.
Developed and modified several Database Procedures, Triggers and views to implement the business logic for the application.
TOAD is used to monitor the turnaround times of queries and to test all the connections.
Prepared the test plans and executed test cases for unit, integration and system testing.
Developed multiple unit and integrations tests using Mockito and Easy Mock.
Used JIRA for reporting bugs in the application.

Environment: Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JMS, JavaScript, XSLT, HTML,CSS, SAX, DOM, XML, UML, TOAD, Mockito, Oracle, Eclipse RCP, JIRA, WebSphere, Unix/Windows.

Confidential

Java Developer

Responsibilities:

Extensive Involvement in Requirement Analysis and system implementation.
Actively involved in SDLC phases like Analysis, Design and Development.
Responsible for Developing modules and assist in deployment as per the client’s requirements.
Application is implemented using JSP and servlets are used for implementing Business logic.
Developed utility and helper classes and Server side Functionalities using servlets.
Created DAO Classes and Written Various SQL queries to perform DML Operations on the data as per the requirements.
Created Custom Exceptions and implemented Exception handling using Try, Catch and Finally Blocks.
Developed user interface using JSP, JavaScript and CSS Technologies.
Implemented User Session tracking in JSP.
Involved in Designing DB Schema for the application.
Implemented Complex SQL Queries, Reusable Triggers, Functions, Stored procedures using PL/SQL.
Worked in pair programming, Code reviewing and Debugging.
Involved in Tool development, Testing and Bug Fixing.
Performed unit testing for various modules.
Involved in UAT and production deployments and support activities.

Environment: Java, J2EE, Servlets, JSP, SQL,PL/SQL,HTML,JavaScript,CSS, Eclipse, Oracle, MYSQL, IBM Websphere,JIRA.

We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Philadelphia, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship