We provide IT Staff Augmentation Services!

Sr. Big Data/hadoop Developer Resume

0/5 (Submit Your Rating)

Troy, NY

SUMMARY

  • Over 9+ years of experience as Big Data/Hadoop developer wif hands on experience in Big Data/Hadoop environment.
  • In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, and Zookeeper.
  • Excellent understanding and extensive knowledge of Hadoop architecture and various ecosystem components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Good usage of Apache Hadoop along enterprise version of Cloudera and Hortonworks.
  • Good Knowledge on MAPR distribution & Amazon's EMR.
  • Good knowledge of Data modeling, use case design and Object - oriented concepts.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Good knowledge on spark components like Spark SQL, MLlib, Spark Streaming and GraphX
  • Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
  • Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
  • Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Involved in integrating hive queries into spark environment using Spark Sql.
  • Hands on experience in performing real time analytics on big data using HBase and Cassandra in Kubernetes & Hadoop clusters.
  • Experience in using Flume to stream data into HDFS.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in developing data pipeline using Flume, Sqoop, and Pig to extract teh data from weblogs and store in HDFS.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
  • Hands on experience working on NoSQL databases including Hbase, Cassandra, MongoDB and its integration wif Hadoop cluster & Kubernetes cluster.
  • Proficient wif Cluster management and configuring Cassandra Database.
  • Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO).
  • Build AWS secured solutions by creating VPC wif private and public subnets.
  • Expertise in configuring Relational Database Service.
  • Worked extensively in configuring Auto scaling for high Availability.
  • Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
  • Experience working wif JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
  • Expert in developing web page interfaces using JSP, Java Swings, and HTML scripting languages.
  • Experience working wif spring and Hibernates frameworks for JAVA.
  • Experience in using IDEs like Eclipse, NetBeans and Intellij.
  • Proficient using version control tools like GIT, VSS, SVN and PVCS.
  • Experience wif web-based UI development using JQuery UI, JQuery, CSS, HTML, HTML5, XHTML and JavaScript.
  • Development experience in DBMS like Oracle, MS SQL Server, Teradata and MYSQL.
  • Developed stored procedures and queries using PL/SQL.
  • Hands on Experience wif best practices of Web services development and Integration (both REST and SOAP).
  • Experience in working wif build tools like Ant, Maven, SBT, and Gradle to build and deploy applications into server.
  • Expertise in Object Oriented Analysis and Design (OOAD) and knowledge in Unified Modeling Language (UML).
  • Expertise in complete Software Development Life Cycle (SDLC) in Waterfall and Agile, Scrum models.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, Kafka, Spark core, Spark Sql, Spark streaming, AWS, Azure Data lake

NoSQL Databases: Hbase, Cassandra, MongoDB

Build Management Tools: Maven, Apache Ant

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting

Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit

Version control: Github, Jenkins

IDE and Tools: Eclipse 4.6, Netbeans 8.2, BlueJ

Databases: Oracle 12c/11g, Microsoft SQL Server2016/2014, DB2 & MySQL 4.x/5.x

Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE)

Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, jQuery and CSS3/2, JSP, Bootstrap 3/3.5

PROFESSIONAL EXPERIENCE

Confidential - Troy, NY

Sr. Big Data/Hadoop Developer

Responsibilities:

  • Utilized Agile Scrum Methodology to halp manage and organize a team of 4 developers wif regular code review sessions.
  • Worked closely wif teh business analysts to convert teh Business Requirements into Technical Requirements and prepared low and high level documentation.
  • Created Hive Tables, loaded transactional data from Teradata using Sqoop and Worked wif highly unstructured and semi structured data of 2 Petabytes in size
  • Integrated Apache Storm wif Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Responsible for developing data pipeline wif Amazon AWS to extract teh data from weblogs and store in HDFS.
  • Created various Documents such as Source-To-Target Data mapping Document, Unit Test, Cases and Data Migration Document.
  • Imported data from structured data source into HDFS using Sqoop incremental imports.
  • Performed data synchronization between EC2 and S3, Hive stand-up, and AWS profiling.
  • Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
  • Expert wif data modeling tools (Erwin / ER Studio). experience wif Embarcadero ER Studio data modeling tools
  • Experience in creating, and maintaining, conceptual, logical and physical data models using Erwin and ER studio
  • Improving teh performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Development of feeds into teh system using a variety of technologies, from code through conventional ETL tools and onto open source tools like Apache NiFi.
  • Supporting data analysis projects by using Elastic MapReduce on teh Amazon Web Services (AWS) cloud performed Export and import of data into s3.
  • Worked on analyzing different big data analytic tools including Hive, Impala and Sqoop in importing data from RDBMS to HDFS.
  • Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS.
  • Developed SQL scripts using Spark for handling different data sets and verifying teh performance over Map Reduce jobs.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
  • Supported MapReduce Programs those are running on teh cluster and also wrote MapReduce jobs using Java API.
  • Wrote complex SQL and PL/SQL queries for stored procedures.
  • Used S3 Bucket to store teh jar's, input datasets and used Dynamo DB to store teh processed output from teh input data set.
  • Created MapReduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developing data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
  • Used Singleton, DAO, DTO, Session Facade, MVC design Patterns.
  • Continuous monitoring and managing teh Hadoop cluster using Cloudera Manager

Environment: Agile, Hive, Teradata, Sqoop, Storm, Kafka, HDFS, AWS, Data mapping, EC2, S3, Hadoop, YARN, MapReduce, RDBMS, Data Lake, Python, Scala, Dynamo DB, Flume, Pig, MongoDB, MVC

Confidential - Boston, MA

Big Data/Hadoop Developer

Responsibilities:

  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Ingested data into HDFS using Sqoop, and written custom Input Adaptors (Network Adapter, FTP Adapter and S3 Adapter) and analyzed teh data using Spark (Data frames and Spark-SQL), and series of Hive scripts to produce summarized results from Hadoop to downstream systems.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Hive.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively.
  • Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to Cassandra DB.
  • Involved in Hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Utilize you're software engineering skills including Java, Python, Scala and Ruby to Analyze disparate, complex systems and collaboratively design new products and services
  • Integrated Kafka wif Spark streaming for real time data processing.
  • Worked wif NoSQL database HBase in getting real time data analytics using Apache Spark wif both Scala and Python
  • Closely worked wif data science team in building Spark MLlib applications to build various predictive models.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating wif storm.
  • Analyzed teh web log data using teh HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Supporting data analysis projects by using Elastic MapReduce on teh Amazon Web Services (AWS) cloud performed Export and import of data into s3.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
  • Involved in designing teh row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
  • Integrated Oozie wif teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
  • Worked on custom Talend jobs to ingest, enrich and distribute data in Cloudera Hadoop ecosystem.
  • Involved in PL/SQL query optimization to reduce teh overall run time of stored procedures.
  • Creating Hive tables and working on them using HiveQL.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using Pyspark and deployed on teh YARN cluster, compared teh performance of Spark, wif Hive and SQL and Involved in End-to-End implementation of ETL logic.
  • Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated wif CI servers like Jenkins to build jobs.
  • Exported teh analyzed data to teh RDBMS using Sqoop for to generate reports for teh BI team.
  • Worked collaboratively wif all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Creating teh cube in Talend to create different types of aggregation in teh data and also to visualize them.

Environment: Hadoop, HDFS, Spark, AWS, S3, Scala, Zookeeper, Map Reduce, Hive, Pig, Sqoop, HBase, Cassandra, MongoDB, Tableau, Java, Maven, UNIX Shell Scripting.

Confidential - Plano, TX

Sr. Java/Hadoop Developer

Responsibilities:

  • Developed PIG UDF'S for manipulating teh data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored teh data into HDFS for analysis.
  • Wrote shell scripts for Key Hadoop services like zookeeper, and also automated them to run by using CRON.
  • Developed PIG scripts for teh analysis of semi structured data.
  • Worked on teh Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer)
  • Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
  • Analyzed teh web log data using teh HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Integrated Oozie wif teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Implemented Capacity schedulers on teh Job tracker to share teh resources of teh Cluster for teh Map Reduce jobs given by teh users.
  • Designed and implemented MapReduce based large-scale parallel processing.
  • Developed and updated teh web tier modules using Struts 2.1 Framework.
  • Modified teh existing JSP pages using JSTL.
  • Implemented Struts Validator for automated validation.
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto teh SQl Server.
  • Performed building and deployment of EAR, WAR, JAR files on test, stage systems in Web logic Application Server.
  • Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
  • Used Singleton, DAO, DTO, Session Facade, MVC design Patterns.
  • Continuous monitoring and managing teh Hadoop cluster using Cloudera Manager
  • Writing complex SQL and PL/SQL queries for stored procedures.
  • Developed Reference Architecture for E-Commerce SOA Environment
  • Used UDF's to implement business logic in Hadoop
  • Custom table creation and population, custom and package index analysis and maintenance in relation to process performance.
  • Used CVS for version controlling and JUnit for unit testing.

Environment: Eclipse, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, MySQL, Cassandra, Java, Shell Scripting, MySQL, SQL.

Confidential - Gwinn, MI

Sr. Java/J2EE Developer

Responsibilities:

  • Involved in Requirement Analysis, Design, Development and Testing of teh JDA Demand product.
  • Involved in teh implementation of design using vital phases of teh Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
  • Developed front-end screens using JSP, HTML, AJAX, JavaScript, ExtJs, JSON and CSS.
  • Involved in overall system's support and maintenance services such as Bug Fixing, Enhancements, Testing and Documentation
  • Developed persistence layer using ORM Hibernate for transparently store objects into database.
  • Responsible for coding all teh JSP, Servlets used for teh Used Module.
  • Developed teh JSP, Servlets and various Beans using WebSphere server.
  • Wrote Java utility classes common for all of teh applications.
  • Analyzed and fine Tuned RDBMS/SQL queries to improve performance of teh application wif teh database.
  • Implemented XSLT's for transformations of teh xml's in teh spring web flow.
  • Developed POJO based programming model using spring framework.
  • Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
  • Handled Java multi threading part in back-end component, one thread will be running for each user, which serves that user.
  • Used Web Services to connect to mainframe for teh validation of teh data.
  • WSDL has been used to expose teh Web Services.
  • Participating in multiple WebEx sessions wif clients/Support in teh process of bug fixing.
  • Developed stored procedures, Triggers and functions to process teh data using PL/SQL and mapped it to Hibernate Configuration File and also established data integrity among all tables.
  • Involved in teh up gradation of WebLogic and SQL Servers.
  • Participated in Code Reviews of other modules, documents, test cases.
  • Performed unit testing using JUnit and performance and volume testing.
  • Implemented UNIX Shell to deploy teh application.
  • Used Oracle database for data persistence.
  • Log4j framework has been used for logging debug, info & error data.
  • Extensively worked on UNIX operating systems.
  • Used GIT as version control system.
  • Implemented teh Business Services and Persistence Services to perform Business Logic.

Environment: JDA, SDLC, JSP, HTML, AJAX, JavaScript, JSON, Backbone JS, XSLT's, xml's, spring framework, Java, Hibernate, JUnit, UNIX Shell, Oracle, Log4j framework, GIT and CSS

Confidential

Java/J2EE Developer

Responsibilities:

  • Developed Interactive GUI screens using HTML, bootstrap and JSP and data validation using Java Script and Angular JS.
  • Developed UI using JSP and AJAX Call in JSP pages, business implementation in Servlets and Struts action class.
  • Filling teh requirement gaps and communicated wif Analyst to fill those gaps.
  • Established a JSON contract to make a communication between teh JS pages and java classes.
  • Worked on Groovy on Grails framework which makes creating complex workflows much simpler.
  • To maintain loose coupling between layers published teh business layer as services and injected teh necessary dependent components using Spring IOC and published cross cutting concerns like Logging, User Interface exceptions, Transactions using Spring AOP.
  • Integrated Spring DAO for data access using Hibernate.
  • Used Spring Security for Autantication and Authorization of teh application.
  • Implemented an asynchronous, AJAX and JQuery UI components based rich client to improve customer experience.
  • Extensively used Maven to manage project dependencies and build management.
  • Developed teh UI panels using Spring MVC, XHTML, CSS, JavaScript and JQuery.
  • Used Hibernate for object Relational Mapping and used JPA for annotations.
  • Integrated Hibernate wif Spring using Hibernate Template and uses provided methods to implement CRUD operations.
  • Established Database Connectivity using JDBC, Hibernate O/R mapping wif Spring ORM for MySQL Server.
  • Used Spring data framework for CRUD operations on MongoDB.
  • Followed good coding standards wif usage of JUnit, Easy Mock and Check style.
  • Build/Integration tools and Deployment using Maven 2 and Jenkins.
  • Consumed Web Services to interact wif other external interfaces in order to exchange teh data in teh form of XML and by using SOAP.
  • Involved in splitting of big Maven projects to small projects for easy maintainability.
  • Involved in deploying and testing teh application in JBoss application server.

Environment: GUI, HTML, bootstrap, Java Script, Angular JS, JSP, AJAX, Struts, Servlets, JSON, java, Hibernate, JQuery, Maven, MVC, XHTML, CSS, JPA, CRUD, JDBC, MongoDB, Jenkins, JBoss

We'd love your feedback!