Sr.Spark/Big Data Developer Resume King of Prussia, PA - Hire IT People

PROFESSIONAL SUMMARY:

Over 8+ Years of Technical Expertise in all phases of SDLC (Software Development Life Cycle) which includes Professional IT Experience in Analyzing, Designing, Building, highly distributed products and working with Big Data/ Hadoop, NO SQL and Java/J2EE Software Practices.
Worked on Various Diversified Enterprise Applications concentrating in Financial, Health Care and Banking Sectors as a Big Data Engineer with Good Understanding of Hadoop Frameworks and various data analyzing tools.
Over 4+ Years of Experience working with Big Data and Hadoop Ecosystem with expertise with Big Data Ecosystem Components HDFS, MapReduce, YARN, Hive, Pig, HBase, Sqoop, Flume, oozie, Zookeeper, Avro, Solr, Spark, Kafka, Strom, Cassandra, Impala, Greenplum and MongoDB.
Experience in importing streaming logs and aggregating the data to HDFS through Flume.
Experience in handling various tools for Big Data analysis using Pig, Hive, Sqoop and Spark.
Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
Developed Apache Spark jobs using Scala in test Environment for faster data processing and use Spark SQL for Querying.
Experience in storing and retrieval of documents in Apache Solr.
Used Oozie Scheduler system to automate the pipeline workflow and orchestrate Hive,Pig and MapReduce jobs that extract the data on a timely manner.
Good Experience in writing Spark applications using Python and Scala.
Experience building data processing pipeline using Kafka and Storm to ingest data into HDFS.
Experience with Testing MapReduce programs using MRUnit and EasyMock.
Combined Pig with Hive to create processing pipelines which can scale quite easily in place of writing low - level MapReduce jobs.
Experience on working with different File formats like FLATFILES, ORC, AVRO and JSON.
Experience in deploying NiFi Data Flow in production team and integrating data from multiple sources like Cassandra, MongoDB.
Developed Spark streaming programs in Scala to transform and store the data into HDFS on the fly.
Hands on noledge creating Amazon EC2 instances, S3 buckets on Amazon EMR.
Experienced inHadoop data testing, data validation and data quality checks.
Used Pig to extract, write complex data transformations, cleaning and processing of large data sets and storing data in HDFS.
Worked on streaming data processing frameworks like Spark Streaming and Storm.
Widely used Spark transformations to normalize data coming from real time data sources.
Configured Kafka producers and created consumer groups to publish and subscribe stream of records in a distributed environment in a fault-tolerant way.
Involved in converting Cassandra/Hive/SQL queries into Spark Transformations using RDD’s and Scala.
Hands on experience in Sequence files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
Migrated the traditional MapReduce jobs to Spark jobs to improve the Speed of Data.
Working Knowledge with Talend, Informatica, Maven, Git Enterprise, Jenkins, Contol-M, Cron, Autosys, Putty and WinSCP.
Worked with Core Java and J2EE Technologies as Servlets, JSP, Collections, Multi-Threading, Exception Handling, EJB, JDBC and Web Services.
Extensive Experience in working with SQL and NOSQL Databases such as MySQl, DB2, MongoDB, Cassandra.
Setting up Solr schema, data import handler to synchronize data to SQL database, Query suggesters and spell checking for approximate searches.
Expertise with Cloud Technologies like Nifi (transformations) and AWS S3 buckets.
Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed the real-time analytics on the incoming data.
Experience in Configuration, Deployments and Managing Different Hadoop Distributions like Cloudera, EMR, HortonWorks (HDP) and Good noledge on Mapper.
Expert in developing applications using Servlets, Hibernate, Spring MVC and Spring Boot Frameworks.
Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
Communicate to Operations regarding project status. Utilize Microsoft Outlook and Excel as Tools to communicate and support Production with various Information needed.
Worked in ITIL environment, Incident, Change and Problem management via ServiceNow.
Expertise in using version control like GITHUB and SVN.
Actively Collaborated with Team members on Daily Scrum meetings to ensure smooth progress in development and on-time completion of sprints.
Experience in implementation of the SDLC process with different project management methodologies including Agile.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, Big Data, HDFS, MapReduce, Yarn, Hive, Sqoop, Pig, Kafka, Flume, Spark, Impala, oozie, Zookeeper, Spark, Mahout, MongoDB, Cassandra, HBase, Avro, Storm, Nifi, Parquet and Snappy

Hadoop Distributions: Cloudera (CDH3, CDH4 and CDH5), Amazon AWS(EMR), HortonWorks, MapR, Apache and Azure

Languages: Java, Python, Scala, SQL, HTML, DHTML, JavaScript, Hive QL, XML, C/C++ and Unix Shell Scripting

NoSQL Databases: Cassandra, MongoDB, HBase and Neo4j

XML Technologies: XML, XSD, XSLT, DTD, JAXP (SAX, DOM)

IDE (Development/Build Tools): Eclipse, Maven, IntelliJ, NetBeans, Jenkins, Putty, WinSCP, Stream Weaver, Servicenow, Junit and log4J

ETL Tools: Talend, Informatica and IBM DataStage

Java,J2EE & Frameworks: Core Java, Servlets, JDBC, Structs, Web Services(REST&SOAP), JSON, Spring and Hibernate

RDBMS: Teradata, Oracle 9i,10g,11i, MySQL, Pl/SQL, Tomcat and MS SQL Server

Version Control: GitHub, SVS and CVS

Methodologies: Agile(Scrum), Waterfall

Operating Systems: Unix, Linux, Mac OS and Windows Variants.

PROFESSIONAL EXPERIENCE:

Confidential, King of Prussia, PA

Sr.Spark/Big Data Developer

Responsibilities:

Understanding Business needs, Analyzing Functional Specifications and map those to Development and Designing.
Worked with Spark for improving Performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frames, RDD’s.
Used Amazon simple storage service(s3), Amazon Elastic MapReduce(EMR) and Amazon cloud(EC2).
Involved in Data ingestion into HDFS using Sqoop for full load and Flume for Incremental load on variety of sources like web server, RDBMS and Data API’s.
Developed the configuration files for Flume source, Channel and sink for creating pipelines from various data sources into HDFS.
Consumed Real time and near real time data coming from various data sources through kafka data pipelines and applied various transformations to normalize the data which further stored in HDFS data lake.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further Analysis.
Used Sqoop to import data from different RDBMS systems like Oracle, DB2 and loaded into HDFS.
Developed oozie workflows and they are scheduled through a scheduler on a monthly basis.
Developed workflow in oozie to automate the tasks of loading the data into HDFS and Pre-processing with Pig.
Having experience in customizing the fusion index pipeline and Query Pipeline and wrote own stages to manipulate the Solr queries.
Involved in creating Hive ORC Tables, Loading the data into it and writing Hive Queries to analyze the Data.
Extensively worked with spark Data frames for ingesting data from flat files into RDD’s to transform unstructured data and structured data.
Created the SparkSQL context to load data from Hive tables into RDD’s for performing complex queries and analytics on data present in data lake.
Used Spark transformations for data wrangling and ingesting the real-time data of various file formats.
Very Good understanding of Partitions, bucketing concepts in Hive and Designed both managed and External tables in Hive to Optimize Performance.
Monitored the Hadoop cluster continuously using Cloudera manager and written the shell scripts for automation of mails to Business team.
Expertise in creating TWS Jobs and Job streams and automate them as per schedule.
Involved in data transfer from Hive tables into Cassandra file system for real time exploration.
Involved in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
Performance tuning using Partitioning, bucketing of Hive tables.
Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
Configured the Hive Metadata and CatalogD to make it possible for Impala daemon to pull data using Hive metadata.
Good understanding of DAG cycle for entire Spark application flow on Spark application in WebUI.
Analyzed and performed data integration using Talend open integration suite.
Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the halp of Zookeeper implementation in the cluster.
Ran many performance tests using the Cassandra-Stress tool in order to measure and improve the read and write performance of the cluster.
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
Involved in converting Hive/SQL queries into Spark transformations using Spark, RDD, Python and Scala.
Involved in development of Software Development Life Cycle(SDLC)and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phases.
Followed Agile Methodology and SCRUM meetings to track, optimize and tailored features to customer needs.

Environment: Java J2EE, Hadoop, AWS, Spark, Scala, Cloudera, Cassandra, HDFS, Flume, Hive, Kafka, Impala, oozie, Zoo keeper, MapReduce, Sqoop, LINUX, MapR, Big Data, UNIX Shell Scripting, Strom, Agile.

Confidential, - Minnetonka, MN

Sr.Hadoop Developer

Responsibilities:

Deployment of Hadoop Cluster (HDInsight) and Data pipelines using Big Data analytic tools.
Worked closely with Data source team for understanding the scale and format of data to be ingested on daily basis.
Used Spark over Hortonworks Hadoop YARN for performing transformations and analytics on Hive tables.
Designed complex ETL systems using SQLServer and NOSQL in python and migration from various databases to Azure Blob storage.
Wrote Lambda functions in python for Azure which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
Imported and Exported the data from RDBMS to HDFS Data lake and HDFS to Teradata using Sqoop Import, Sqoop incremental Import and Sqoop Export functionalities and scheduled the jobs on daily basis with Shell Scripting.
Used Sqoop import functionality for loading Historical data present in a Relational Database system into HadoopFile System(HDFS).
Extensively used Solr to enable indexing for enabling searching on non-primary key columns from the Cassandra key spaces.
Analyzed the SQL scripts and Designed the Solution to Implement Using PySpark.
Efficiently joined raw data with the reference data using Pig scripting.
Used various file formats like Parquet, Avro, ORC and compression techniques like Snappy, LZO and GZip for efficient management of cluster resources.
Written Hadoop MapReduce jobs using JAVA API for processing data present on HDFS.
Imported the historical data present in MongoDB using Sqoop import and stored in HDFS using compression techniques.
Expert noledge in MongoDB NoSQL data modelling, tuning, disaster recovery and Backup.
Unstructured files like XML’s, JSON files are processed using custom built java API and pushed into mongoDB.
Developed processes to integrate events data from Nifi Transformations and finally load to AWS S3 buckets.
Worked on migrating the old java stack to type safe stack using Scala for Backend Programming.
Used Slick to query and storing in database in a Scala fashion using the powerful Scala collection framework
Worked on MongoDB, NoSQL data Modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
Written the Shell Scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Fetched and Generated monthly reports, Visualization of those reports using Tableau.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Used AVRO, Parquet file formats for serialization of data.
Agile Methodology and SCRUM meetings to track, optimize and tailored features to customer needs.

Environment: Apache Hadoop, Pig, Hive, Sqoop, Spark, Spark Streaming, SparkSQl, Kafka, MapReduce, HDFS, LINUX, oozie, MongoDB, Solr, AWS, Tableau, Nifi, Rabbit MQ, Agile.

Confidential - Troy, MI

Hadoop Developer

Responsibilities:

Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. We used for improving campaign targeting and efficiency.
Responsible for building scalable distributed data solutions usingHadoop.
Using oozie workflows and enabled email alerts on any failure cases.
Developed Simple to complex MapReduce Jobs that are implemented using Hive and Pig.
Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior and used UDF's to implement business logic inHadoop.
Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
Managing and ReviewingHadoopLog Files, deploy and MaintainingHadoopCluster.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL.
Supporting HBase Architecture Design with theHadoopArchitect group to build up a Database Design in HDFS.
Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase.
Wrote Flume configuration files for importing streaming log data into HBase with Flume.
Experience in implementing using one or more Azure PaaS services like web sites, SQL Azure Database, Storage, Cloud Services.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Loading the data to HBase using Pig, Hive and Java API’s.
Incoming messages were handled by using play framework MVC framework.
Managed and reviewedHadooplog files to identify issues when job fails.
Analyzed large data sets by running Hive queries and Pig scripts.
Implemented Frameworks using Java and Python to automate the ingestion flow.
Worked on tuning the performance on Pig queries.
Mentored analyst and test team for writing Hive Queries.
Troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Java J2EE, Hadoop, AWS, Cloudera, Cassandra, HDFS, Flume, Hive, Kafka, Impala, oozie, MapReduce, Sqoop, LINUX, HBase, Scala, Spark, MapR, Big Data, UNIX Shell Scripting, Strom, Agile.

Confidential - Springfield, IL

Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Analyzed large data sets by running Hive Queries and Pig Scripts.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data.
Created custom new columns depending up on the use case while ingesting the data into HadoopLake using Pyspark.
Experience in building CI/CD methodology in Azure using technologies like Jenkins.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
Involved in creating Hive tables, loading and analyzing data using Hive Queries.
Load and transform large sets of structured, semi structured and unstructured data.
Worked with application teams to install Hadoop updates, patches and version upgrades as required
Implemented test scripts to support test driven development and continuous integration.
Developed and maintained complex outbound notification applications that run on custom architectures, using diverse technologies including Java, J2EE, SOAP, XML, JMS and JBoss.
Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
Implemented Oozie workflow engine to run multiple Hive and Python jobs
Used Sqoop, Pig, Hive as ETL tools for pulling and transforming data.
Managed and reviewed Hadoop Log Files. Used Scala integration Spark into Hadoop
Migrated data existing in Hadoop cluster into Spark and used SparkSQL and Scala to perform actions on the data
Wrote Shell Scripts for rolling day-to-day processes and it is Automated.
Troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Java J2EE, Hadoop, Spark, AWS, Cloudera, Cassandra, HDFS, Flume, Hive, Kafka, Impala, oozie, MapReduce, Scala, Sqoop, LINUX, MapR, Big Data, PySpark, UNIX Shell Scripting, Strom, Agile.

Confidential

Java/J2EE Developer

Responsibilities:

Designed and developed rich front-end screens using JSF, JSP, Docker, CSS, HTML and jQuery.
Developed Managed beans and defined Navigation rules for the application using JSF.
The application we developed is based on microservices Architecture.
Worked on generating the web services classes by using SOA, WSDL, UDDI and SOAP.
Responsible for developing Use case diagrams, Class diagrams, Sequence diagrams and process flow diagrams for the modules using UML and Rational Rose.
Configured the Hibernate mapping files for mapping the domain objects to the database tables and their corresponding properties to the table columns.
Queries for accessing data were built using the Hibernate API.
Used Java Messaging Services (JMS) for reliable and asynchronous exchange of essential information such as payment status report to MQServer using MQSeries.
Used RAD as IDE for development, build, deployment and testing the application.
Experience with Java microservices in Spring.
Used Log4j framework for logging the application.
Used Maven for build and deployment.
Used SVN as a version control tool and used WebSphere server.
Performed some Unit Testing on the application and the web services before its release to QA.
Documented and communicated test result to the team lead on daily basis.
Tested the whole module using SOAPUI.
Involved in writing database connection classes for interacting with Oracle database. Incorporated Singleton Pattern to implement the database access classes.
Involved in development of Staffing sub-modules like Staffing Override, Interview Override, Resume Upload.
Performed Analysis and development of Stateless Session Bean, Data Access object and Application Component for Screening and Shortlisting module.
Configured JBoss Application Server and deployed the web components into the server,
Involved in debugging, testing and integration of the system.
Worked with Spring, Restful Web Services to interact with Objects created ORM tools.
Worked on fixing bugs raised by the users,
Worked with Spring Restful Web Services to interact with the JPA Objects created using ORM tools.
Documented all the low-level design of the Application.
Developed JSP / Action servlet classes.
Designed and developed user interfaces using JSP, JavaScript and HTML.
Developing Hibernate XML Java object-to-database mapping documents.

Environment: Core Java, J2EE, EJB, JSP, HTML, Java Script, Hibernate, Restful Web services, Eclipse, UNIX.

We provide IT Staff Augmentation Services!

Sr.spark/big Data Developer Resume

King Of Prussia, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship