We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY:

  • Certified Hadoop Developer with around 9 years of professional IT experience which includes experience in Big data ecosystem, Spark, Kafka, HBase, Solr, Scala and Java.
  • Excellent Experience in Hadoop architecture and various components such as HDFS, YARN and MapReduce programming paradigm.
  • In depth Experience in Spark programming using Scala.
  • Excellent Experience in real - time data processing using Spark Streaming and Kafka.
  • Experience in performing ETL using Spark, Spark SQL.
  • Extensive Experience in NOSQL databases like HBase.
  • Experience in Search platforms like Solr.
  • Experience in writing Custom Input Formats in both MR1 and MR2.
  • Experienced in extending Hive and Pig core functionality by writing custom UDFs and Map Reduce Scripts using Java & Python.
  • Experienced in scrubbing and formatting the data using Hive and pig scripts.
  • Experienced in importing the data using Flume from different source systems.
  • Experience in NOSQL databases like Cassandra
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Experience in Hadoop Distributions: Cloudera and Hortonworks
  • Hands on managing Hadoop clusters using Ambari.
  • Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
  • Worked in developing a Nifi flow prototype for data ingestion in HDFS.
  • Hands on managing Hadoop clusters using Cloudera Manager.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications using Servlets &JSPs, Struts and Spring MVC.
  • Experience working in Oracle, SQL Server and My SQL databases.
  • Experience in developing Web Applications using Spring MVC and Knockout
  • Experience in developing Web Applications using Spring MVC and AngularJs
  • Experience in developing web services using JAX-RS & JAX-WS
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, HBase, Hadoop MapReduce, Zookeeper, Hive, Pig, NifiSqoop, Flume, Oozie, Cassandra, Hortonworks, Spark, Kafka.

Languages: Java, JAX-WS, JAX-RS, Scala

Frameworks: Spring MVC, Spring Jdbc, Struts, Hibernate.

Methodologies: Agile, Water Fall.

Database: Oracle, SQL Server and My SQL.

Web Tools: HTML, CSS,XML, Java Script, JQuery, AngularJS, Servlets JSPs.

IDEs: Eclipse.

Operating System: Windows, UNIX, Linux

Scripts: JavaScript, JQuery.

Design: UML, Visio.

PROFESSIONAL EXPERIENCE:

Confidential, Phoenix, AZ

Big Data Developer

Responsibilities:

  • Data Ingestion implemented using SQOOP, SPARK, loading data from various RDBMS, CSV, XML files.
  • Data cleansing, transformations tasks are handled using SPARK using SCALA and HIVE .
  • Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats by applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
  • Proficient in AWS services like VPC, EC2, S3, ELB, Red Shift, AutoScaling Groups(ASG), EMR, RDS, IAM, CloudWatch, CloudFront, CloudTrail.
  • Experience in building and Configuring a virtual data center in the AWS cloud to support Enterprise Data Warehousing including VPC, Route tables and Elastic load balancing.
  • Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
  • Migrated the computational code in hql to PySpark.
  • Completed data extraction, aggregation and analysis in HDFS by using PySpark and store the data needed to Hive.
  • Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement using PySpark.
  • Developed Micro Services to retrieve data from the frontend system for various data retrieval patterns.
  • Created HIVE tables on top of the clean data, created tableau reports on the final tables.
  • For incremental loads handled the duplicate / source updated records issues with SQOOP merge and HIVE 3 step process to eliminate duplicate records to keep most up to date data for reporting
  • Built a Ingestion Framework that would ingest the files from SFTP to HDFS using Apache NIFI and ingest Financial data into HDFS.
  • Scripts developed to load Log data using FLUME and stores data in HDFS on daily basis.
  • Worked on real-time data processing using Spark Streaming and Kafka using Scala .
  • SPARK-Scala RDD s are used to transform, filter data which contains “ERROR ”, “FAILURE”, “WARNING” in the log lines and then stored into HDFS.
  • HIVE tables created for the data loaded into HDFS, applied Context-N-Gram functionality and generated TRIGrams frequency for the given data sets.
  • Trigram Frequency data is submitted to Data Science teams to apply various NLP algorithms to refine accurate models to understand the near failure nature for hardware.
  • Once the data is consumed by Data Science teams the Hive tables are dropped and process is repeated daily basis.
  • Converted INFORMATICA ETL logic, which is re written in SPARK, SCALA using SPARK Data Frames API for data transformations, ETL jobs, SPARK SQL for processing data as per BI aggregations, reporting needs.
  • Reporting ready data (Dimensions, Facts) are stored to HDFS and then to BI reporting Database with SQOOP.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Data comparison tests conducted between Informatica and the new solution designed.

Environment: SPARK, Python/PySpark, Spark-Streaming, Spark-SQL, Kafka, SCALA, JAVA, HIVE, Hortonworks (HDP 2.0), Micro Services, OOZIE, CDH 5, HDFS, FLUME, PIG, HBase, Nifi, SQOOP, AWS (EC2, EMR and S3), Informatica, Linux.

Confidential, VA

Big Data Developer

Responsibilities:

  • Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
  • Built pipelines to move hashed and un-hashed data from Azure Blob to Data lake.
  • Utilized Azure HDInsight to monitor and manage the Hadoop Cluster.
  • Collaborated on insights with Data Scientists, Business Analysts and Partners.
  • Performed advanced procedure like text analytics and processing, using the in-memory computing capabilities of Spark using Python.
  • Created pipelines to move data from on-premise servers to Azure Data Lake.
  • Utilized Python Panda Frame to provide data analysis.
  • Enhanced and optimized Spark scripts to aggregate, group and run data mining tasks.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the output response.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDD’s and PySpark.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frames and Pair RDD’s.
  • Used Spark API over Hadoop Yarn to perform analytics on data and monitor scheduling.
  • Implemented schema extraction for Parquet and Avro file formats.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval Time, correct level of Parallelism and memory tuning.
  • Developed Hive queries to process the data and generate the data cubes for visualization.
  • Built specific functions to ingest columns into Schemas for Spark Applications.
  • Experienced in handling large data sets using Partitions, Spark in memory capabilities, effective and efficient Joins, Transformations and other during ingestion process itself.
  • Developed data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional data sources for data access and analysis.
  • Analyzed SQL scripts and designed the solution to implement using PySpark.
  • Used reporting tools like Power BI for generating data reports daily.
  • Handled several techno-functional responsibilities including estimates, identifying functional and technical gaps, requirements gathering, designing solutions, development, developing documentation and production support.

Environment: Hadoop(HDFS/Azure HDInsight), HIVE, YARN, Python/Spark, Linux, Scala, MS SQL Server, Power BI, .

Confidential, Albany, NY

Hadoop Developer

Responsibilities:

  • Worked closely with the Source System Analysts and Architects in identifying the attributes and to convert the Business Requirements into Technical Requirements.
  • Actively involved in setting up coding standards, prepared low and high level documentation.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra .
  • Implemented the Hive queries for aggregating the data and extracting useful information by sorting the data according to required attributes.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Evaluated the use of Zookeeper in cluster co-ordination services.
  • Involved in loading data from UNIX file system to HDFS.
  • Assisted in migrating from On-Premises Hadoop Services to cloud based Data Analytics using AWS.
  • Developed Map Reduce jobs to validate and implement business logics.
  • Used AVRO Serdes to handle AVRO format data in HIVE and IMPALA.
  • Used Solr/Lucene for developing open source enterprise search platform in a testing and development environment.
  • Used AWS remote computing services such as S3, EC2.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Spark accumulators and broadcast variables
  • Designed and implemented MapReduce based large-scale parallel relation-learning system.
  • Involved in tuning of Cassandra cluster by changing the parameters of Read operation, Compaction, Memory Cache, Row Cache.
  • Worked on OOZIE workflow engine for job scheduling.
  • Imported required tables from RDBMS to HDFS using Sqoop and also used Storm/ Spark streaming and Kafka to get real time streaming of data into HBase.
  • Experience in designing ETL solutions using Informatica Power Centre tools such as Designer, Repository manager, Workflow manager, and Workflow Monitor.
  • Gained knowledge in installing cluster, commissioning and decommissioning of Data Node, Name Node recovery, capacity planning and slots configuration.
  • Utilized Apache Hadoop by Hortonworks to monitor and manage the Hadoop Cluster.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Converted the feasible business requirements to technical tasks in Design Documents.
  • Worked according to production environment configuration and functional change requests.
  • Involved in Unit level and Integration level testing and prepared supporting documents for proper deployment.

Environment: s: Hadoop(HDFS/Horton Works), AWS S3, EC2, PIG, HIVE, UDF, SQOOP, Datastax Cassandra, Scala/Spark, Linux, Storm, Solr, Lucene, Kafka, Impala.

Confidential, Hamilton, NJ

Hadoop Developer

Responsibilities:

  • Developed scripts, UDF's using both Spark SQL and Spark Core in scala for Data Aggregation, queries and verified its performance over MR jobs.
  • Responsible for building scalable distributed data solutions using Hadoop components.
  • Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS
  • Wrote job work flows as per the requirements and their dependencies.
  • Used Sqoop to dump data from relational database into HDFS for processing.
  • Worked on implementing partition, dynamic partition and buckets in HIVE for efficiently accessing data.
  • Hands on experience in copying log files into HDFS from Greenplum using Flume.
  • Developed and implemented PIG UDF to preprocess the data and use it for analysis.
  • Wrote custom MapReduce codes, generated JAR files for user defined functions and integrated with HIVE to help the analysis team with the statistical analysis.
  • Used Zookeeper for job synchronization.
  • Implemented Fair schedulers on the job tracker to share the resources of the cluster for the MapReduce jobs given by the users.
  • Prepared AVRO schema files for generating HIVE tables and shell scripts for executing HADOOP commands for single execution.
  • Wrote queries in Datastax Cassandra for searching, sorting and generating the data.
  • Gained working knowledge in NOSQL Databases (HBase and Datastax Cassandra).
  • Used SQOOP to load data from DB2 into HBASE environment.
  • Provide in-depth technical and business knowledge to ensure efficient design, programming, implementation and on-going support for the application.
  • Responsible for complete SDLC management using Agile Methodology.
  • Installed and configured Hadoop MapReduce, HDFS, HIVE, PIG, SQOOP, Flume, OOZIE on the Hadoop cluster are installed and configured.

Environment: Hadoop, MapReduce, Cloudera Manager, HDFS, HIVE, PIG, HBase, Solr, Sqoop,Spark/Scala, Flume, Oozie, UNIX Shell Scripting, SQl, Eclipse.

Confidential

Java/J2EE Developer

Responsibilities:

  • Part of core development team involved in the Re-engineering activities.
  • Designed, implemented and tested the Spring Domain model for the services using Core JAVA.
  • Participated in a feasibility study on JSF MVC architecture for the project.
  • Wrote custom support modules for upgrade implementation using Pl/Sql, Unix Shell Scripts.
  • JSF Migration - Worked on the re-engineering effort to convert the properitary servlet based application to JSF based MVC Architecture.
  • Spring Introduction - Involved in complete hands on programing on the core product development using J2EE, JSF and Spring.
  • POJO Architecture - Re-engineered the application using IoC principles and removed heavy weight application to light wieght model by removing Enterprise Java Beans and re-worked the business model with Simple POJOs based architecture.
  • Participated in the activities to Convert services to Web Services using Axis.
  • Developed and Implemented MVC Architecture using JSF and Spring
  • Implemented AJAX functionality using RichFaces Components.
  • Implemented custom converters and validators in JSF.
  • Involved in writing the ANT scripts to build and deploy the application.
  • Developed automated build scripts that check out the code from CVS and build the application using Apache ANT.
  • Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle
  • Interaction with Oracle database is implemented using Hibernate.
  • Used XSL/XSLT for transforming and displaying reports. Developed Schemas for XML.

Environment: Java, J2EE, Spring MVC, AJAX, JAXB, JavaScript, Hibernate, Oracle 10g, Toad, XML, Ant 1.7,Log4J, SQL, PL/SQL, RAD, WebSphere.

Confidential

Java Developer

Responsibilities:

  • Responsible and active in analysis, design, implementation and deployment of full Software Development Life Cycle(SDLC) of the project.
  • Designed and developed user interface using JSP, HTML and JavaScript.
  • Developed Struts action classes, action forms and performed action mapping using STRUTS framework and performed data validation in form beans and action classes.
  • Extensively used STRUTS framework as the controller to handle subsequent clients and invoke the model based upon user requests.
  • Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
  • Validated the fields of user registration screen and login screen by writing JavaScript validations.
  • Developed build and deployment scripts using APACHE ANT to customize WAR and EAR files.
  • Used DAO and JDBC for database access.
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
  • Design and develop XML processing components for dynamic menus on the application.
  • Involved in post-production, support and maintenance of the application.
  • Involved in analysis, design, and implementation and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Designed tables and indexes.
  • Wrote complex SQL and stored procedures.
  • Involved in fixing bugs and unit testing with test cases using JUNIT.
  • Developed user and technical documentation.

Environment: MVC, JSP, SQL, JDK1.3.8, J2EE, Dojo, Servlets, SQL Server, Web Logic, JavaScript, CSS, HTML

Confidential

Java Developer

Responsibilities:

  • Involved in all phases of Software Development Life Cycle (SDLC) such as Analysis, Design, Development, Testing and Implementation.
  • Enhanced and optimized the functionality of Web UI using AJAX, CSS, HTML and JavaScript.
  • Implemented Spring MVC framework for developing J2EE based web application.
  • Spring was used for its MVC features, dependency injection, its AOP and its plug-ability with Hibernate. Used MAVEN to add dependencies.
  • Used Log 4 J to capture the log that includes runtime exceptions.
  • Responsible for the complete design and development of the User Administration module.
  • Involved in developing the customer form data tables, stored procedures and triggers.
  • Written SQL queries and PL/SQL stored procedures.
  • Used SVN for version control.
  • Developed & Used Web Services (WSDL, SOAP & UDDI) in the Application.
  • Tracked the defects during batch monitoring and worked on them to closure through JIRA.
  • Participated actively in application production launch.
  • Participated efficiently in the Development and Testing phases of the project. Create program specifications and unit test plans for software programs.
  • Involved in preparation of functional definition documents.
  • Involved in the discussions with business users, testing team to finalize the technical design documents.

Environment: Java 7, Design Patterns, CSS, HTML, Java Script, Servlets,JSP, SOAP, MAVEN, JDBC, Springs, Hibernate, Apache Tomcat, XML,SQL, PL/SQL, Oracle, Junit.

We'd love your feedback!