Sr. Big Data Developer Resume
Shelton, CT
SUMMARY:
- Above 7+ years of experience in Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Good Knowledge in Amazon AWS concepts like EMR and EC2webservices which provides fast and efficient processing
- Extensive experience in installing, configuring and using Big Data ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive, Impala, Spark and Zookeeper
- Expertise in using J2EE application servers such as IBM WebSphere, JBoss and web servers like Apache Tomcat.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Hortonworks Distributions (HDP).
- Experience in developing webservices with XML based protocols such as SOAP, Axis, UDDI and WSDL.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Good knowledge on various API Managers like Azure.
- Very well verse in writing and deploying Oozie Workflows and Coordinators.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- In-depth knowledge of handling large amounts of data utilizing Spark Data Frames/Datasets API and Case Classes.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Working experience in Impala, Mahout, SparkSQL, Storm, Avro, Kafka, and AWS.
- Experience with Java web framework technologies like Apache Camel and Spring Batch.
- Experience in version control and source code management tools like GIT, SVN, and BitBucket.
- Hands on experience working with databases like Oracle, SQL Server and MySQL.
- Great knowledge of working with Apache Spark Streaming API on Big Data Distributions in an active cluster environment.
- Proficiency in developing secure enterprise Java applications using technologies such as Maven, Hibernate, XML, HTML, CSS Version Control Systems
- Developing and implementing Apache NIFI across various environments, written QA scripts in Python for trackingfiles.
- Excellent understanding of Hadoop and underlying framework including storage management.
- Good Knowledge and experience of functionality of NoSQL DB like Cassandra and MongoDB.
- Extensive experience in building and deploying applications on Web/ApplicationServers like Web logic, Web sphere, and Tomcat.
- Expertise in JavaScript, JavaScript MVC patterns, object Oriented JavaScript Design Patterns and AJAX calls.
- Experience in using ANT for building and deploying the projects in servers and also using JUnit and log4j for debugging.
- Have good experience, excellent communication and interpersonal skills which contribute to timely completion of project deliverable well ahead of schedule.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake, Data Factory
Hadoop Distributions: Cloudera, Hortonworks, MapR
Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets
Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS
Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX
Databases: Oracle 12c/11g, SQL
Operating Systems: Linux, Unix, Windows 10/8/7
IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven
NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo
Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere
SDLC Methodologies: Agile, Waterfall
Version Control: GIT, SVN, CVS
PROFESSIONAL EXPERIENCE:
Confidential - Shelton, CT
Sr. Big Data Developer
Responsibilities:
- Responsible for building and configuring distributed data solution using MapR distribution of Hadoop.
- Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
- Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- Involved in Agile methodologies, daily scrum meetings, sprint planning.
- Developed a JDBC connection to get the data from Azure SQL and feed it to a Spark Job.
- Configured Sparkstreaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Developed the Sqoop scripts to make the interaction between Hive and vertical Database.
- Involved in converting Hive/SQLqueries into Spark transformations using Spark RDDs, and Scala.
- Deployed the application in Hadoop cluster mode by using spark submit scripts.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark, Hive.
- Implemented various Hadoop Distribution environments such as Cloudera and Hortonworks.
- Implemented monitoring and established best practices around usage of ElasticSearch.
- Worked on Installing and configuring the HDP Hortonworks and Cloudera Clusters in Dev and Production Environments.
- Worked with NoSQL databases HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
- Worked on Apache Nifi as ETL tool for batch processing and real time processing.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Captured the data logs from web server into HDFS using Flume for analysis.
- Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
- Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Worked on importing data from HDFS to MYSQLdatabase and vice-versa using SQOOP.
- Developed Python, Shell/Perl Scripts and Powershell for automation purpose and Component unit testing using Azure Emulator.
- Worked on importing data from MySQL to HDFS and vice-versa using Sqoop to configure Hive Metastore with MySQL, which stores the metadata for Hive tables.
- Loaded the customer's data and event logs from Kafka into HBase using REST API.
- Used hive data warehouse modeling to interface with BI tools such as Tableau from Hadoop also, enhance the existing applications.
- Built the automated build and deployment framework using Jenkins, Maven etc.
- Wrote Scala based Spark applications for performing various data transformations, De-normalization, and other custom processing.
- Performed tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Created a multi-threaded Java application running on edge node for pulling the raw click stream data from FTP servers.
Environment: Big Data 3.0, Hadoop 3.0, Azure 2010, Hortonworks 3.0, MongoDB, HDFS, Agile, Hive 2.3, SQL, Cassandra 3.0, Cloudera, Elastic Search 6.6, HBase 1.2,ApacheNifi1.9,Sqoop 1.4, Kafka 1.1, Oozie 4.3,XML, JSON, Pig 0.17, MYSQL, Scala 1.4 Spark 2.3, Java 10.
Confidential - Philadelphia, PA
Sr. Spark/Hadoop Developer
Responsibilities:
- Extensively migrated existing architecture to Spark Streaming to process the live streaming data.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
- Worked with multiple teams to provision AWS infrastructure for development and production environments.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Created custom new columns depending up on the use case while ingesting the data into Hadoop Lake using PySpark.
- Designed and developed applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Developed time driven automated Oozie workflows to schedule Hadoop jobs.
- Analyzed Cassandra database and compare it with other open-source No-SQL databases to find which one of them better suites the current requirement.
- Designed and implemented MapReduce jobs to support distributed processing using java, Hive, Scala and Apache Pig.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured.
- Worked with Apache Nifi for managing the flow of data from sources through automated data flow.
- Created, modified and executed DDL and ETL scripts for De-normalized tables to load data into Hive and AWS Redshift tables.
- Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Participated in development/implementation of Cloudera Impala Hadoop environment.
- Used Test driven approach for developing the application and implemented the unit tests using Python UnitTestframework.
- Developed Date, Currency type conversions, and Sales Spark UDF's to reduce the load on Tableau.
- Implemented POC to migrate MapReduce programs into Spark transformations using and Scala.
- Managed and review data backups, Manage and review Hadoop log files Hortonworks Cluster.
Environment: Spark 2.4, Hadoop 2.7, Agile, AWS, Scala 2.1, Hive 1.2, SQL, Oracle, HDFS, Apache Flume 1.8, HBase1.0, ApacheNifi1.5, ETL, PIG 0.16, MongoDB, Cassandra 2.0, Apache Kafka 0.10 Zookeeper 3.4, Impala, Hortonworks.
Confidential - Newport Beach, CA
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Worked on S3buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
- Involved in Installing, Configuring Hadoop Eco-System, Cloudera Manager using CDH3, CDH4 Distributions.
- Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Highly skilled in integrating Kafka with Spark streaming for high-speed data processing
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Used Enterprise Data Warehouse database to store the information and to make it access all over organization.
- Exported of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collected and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
- Worked on Spark for in memory commutations and comparing the Data Frames for optimizing performance.
- Created webservices request-response mappings by importing source and target definition using WSDL file.
- Developed custom UDF's to generate unique key for the use in Apache pig transformations.
- Created conversion scripts using Oracle SQL queries, functions and stored procedures, test cases and plans before ETL migrations.
- Developed Shell scripts to read files from edge node to ingest into HDFS partitions based on the file naming pattern.
Environment: Hadoop 2.7, AWS, HDFS, MongoDB, Cassandra 2.0, Oozie, Sqoop,Apache Hive, Kafka 0.10, Spark, MySQL, Flume, Oracle, ETL, MapReduce, HBase 1.0,Scala 2.1,JSON, Cloudera.
Confidential
Java/J2EE Developer
Responsibilities:
- Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
- Designed Rich user Interface Applications using JavaScript, CSS, HTML and AJAX and developed web services by using SOAP.
- Implemented modules using Java APIs, Java collection, Threads, XML, and integrating the modules.
- Successfully installed and configured the IBM WebSphere Application server and deployed the business tier components using EARfile.
- Designed and developed business components using Session and EntityBeans in EJB.
- Implemented CORS (Cross Origin Resource Sharing) using Node JS and developed RESTservices using Node and Express, Mongoose modules.
- Used Log4j for Logging various levels of information like error, info, debug into the log files.
- Developed integrated applications and light weight component using springframework and IOC features from spring web MVC to configure application context for spring bean factory.
- Used JDBC prepared statements to call from Servlets for data base access.
- Used Angular JS to connect the web application to back-end APIs, used RESTFUL methods to interact with several API's.
- Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
- Implemented JQuery features to develop the dynamic queries to fetch the data from database.
- Involved in developing module for transformation of files across the remote systems using JSP and Servlets.
- Worked on Development bugs assigned in JIRA for Sprint following agile process.
- Used ANT scripts to fetch, build, and deploy application to development environment.
- Extensively used JavaScript to provide the users with interactive, Speedy, functional and more usable user interfaces.
- Implemented MVC architecture using Apache struts, JSP & Enterprise Java Beans.
- Used AJAX and JSON to make asynchronous calls to the project server to fetch data on the fly.
- Developed batch programs to update and modify metadata of large number of documents in FileNet Repository using CE APIs
- Worked on creating a test harness using POJOs which would come along with the installer and test the services every time the installer would be run.
- Worked on creating Packages, Stored Procedures & Functions in Oracle using PL/SQL and TOAD.
- Used JNDI to perform lookup services for the various components of the system.
- Deployed the application and tested on JBoss Application Server. Collaborated with Business Analysts during design specifications.
- Developed Apache Camel middleware routes, JMS endpoints, spring service endpoints and used Camel free marker to customize REST responses.
Environment: J2EE, CSS, HTML5, AJAX, SOAP, JavaScript, XML, IBM Web Sphere, EJB, Log4j, JQuery JSP Apache ANT, JSON, POJOs, PL/SQL, TOAD, Node JS, JBoss, Apache Camel, CI/CD, Jenkins.