Hadoop/big Datadeveloper Resume
Canonsburg, PA
SUMMARY:
- Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud - based platforms.
- Above 8+ years of experience as Big Data/ Hadoop and JavaDeveloper with skills in analysis, design, development, testing and deploying various software applications.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa. Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2webservices which provides fast and efficient processing of Teradata BigData Analytics.
- Experience in collection of LogData and JSON data into HDFS using Flume and processed the data using Hive/Pig. Extensive experience on developing SparkStreaming jobs by developing RDD's (Resilient Distributed Datasets) and used SparkSQL as required.
- Experience on developing JAVAMapReduce jobs for data cleaning and data manipulation as required for the business.
- Strong knowledge on Hadoopeco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
- Expertise in JavaScript, JavaScriptMVC patterns, ObjectOrientedJavaScriptDesign Patterns and AJAX calls.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
- Experience working with Horton works and Cloudera environments.
- Good knowledge in implementing various data processing techniques using ApacheHBase for handling the data and formatting it as required.
- Excellent experience in installing and running various Oozieworkflows and automating parallel job executions.
- Experience on Spark and SparkSQL, SparkStreaming, SparkGraphX, SparkMlib.
- Extensively development experience in different IDE like Eclipse, NetBeans, IntelliJ and STS.
- Strong experience in coreSQL and Restfulwebservices (RWS).
- Strong knowledge in NOSQLcolumn oriented databases like HBase and its integration with Hadoopcluster.
- Good experience in Tableau for DataVisualization and analysis on large datasets, drawing various conclusions.
- Experience in using Python, R for statisticalanalysis.
- Good knowledge of coding using SQL, SQLPlus, T-SQL, PL/SQL, Stored Procedures/Functions.
- Worked on Bootstrap, AngularJS and NodeJS, knockout, ember, Java Persistence Architecture (JPA).
- Well versed working with Relational Database Management Systems as Oracle12c, MSSQL, MySQLServer.
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Sponsorship required to work in the US
PROFESSIONAL EXPERIENCE:
Hadoop/Big DataDeveloper
Confidential - Canonsburg, PA
Responsibilities:
- As a Big Data Developer, I worked on Hadoop eco-systems including Hive, HBase, Oozie, Yarn, Spark Streaming MCS (MapR Control System) and so on with MapR distribution.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Primarily involved in Data Migration process using Azure by integrating with Bitbucket repository and Jenkins.
- Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
- Worked on Apache Solr which is used as indexing and search engine.
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
- Worked on analyzing Hadoop stack and different Big data tools including Pig and Hive, Hbase database and Sqoop.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked with different data sources like Avro data files, XML files, JSON files, SQL server and Oracle to load data into Hive tables.
- Used Spark to create the structured data from large amount of unstructured data from various sources.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
- Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, Impala and loaded final data into HDFS.
- Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
- Experienced in designing and developing POC's in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Responsible for coding MapReduce program, Hive queries, testing and debugging the MapReduce programs.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Involved in loading data from UNIX file system to HDFS. Involved in designing schema, writing CQL's and loading data using Cassandra.
- Built the automated build and deployment framework using Jenkins, Maven etc.
Big data/ Hadoop Developer
Confidential - Houston, TX
Responsibilities:
- Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
- Responsible for developing prototypes the selected solutions and implementing complex big data projects with a focus on collecting, parsing, managing, analyzing and visualizing large sets of data using multiple platforms.
- Understand how to apply technologies to solve big data problems and to develop innovative big data solutions.
- Developed Spark Applications by using Scala, Java and Implemented ApacheSpark data processing project to handle data from various RDBMS and Streaming sources.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
- Responsible for analyzing and cleansing raw data by performing Hivequeries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Performed importing data from various sources to the Cassandra cluster using Sqoop. Worked on creating data models for Cassandra from Existing Oracle data model.
- Used Spark - Cassandra connector to load data to and from Cassandra.
- Worked in Spark and Scala for DataAnalytics. Handle ETL Framework in Spark for writing data from HDFS to Hive.
- Used Scala based written framework for ETL.
- Developed multiple spark streaming and core jobs with Kafka as a data pipe-line system
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.
- Imported data from AWSS3 into SparkRDD, Performed transformations and actions on RDD's.
- Worked on Talend with Hadoop. Worked in migrating from InformaticaTalendjobs.
- Implemented a distributed messaging queue to integrate with Cassandra using ApacheKafka and Zookeeper.
- Developed Kafka producer and consumer components for real time data processing.
- Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
- Involved in CassandraDatamodeling to create key spaces and tables in multi Data CenterDSECassandra DB.
Environment: Spark, HDFS, Kafka, MapReduce (MR1), Pig, Hive, Sqoop, Cassandra, AWS, Talend, Java, Linux Shell Sripting.
Big Data/ Hadoop Developer
Confidential, -FL
Responsibilities:
- Performed data transformations like filtering, sorting, and aggregation using Pig
- Creating Sqoopqueries to import data from SQl, Oracle, and Teradata to HDFS
- Created Hive tables to push the data to MongoDB.
- Wrote complex aggregate queries in mongo for report generation.
- Developed scripts to run scheduled batch cycles using Oozie and present data for reports
- Worked on a POC for building a movie recommendation engine based on Fandango ticket sales data using Scala and SparkMachineLearning library.
- Developed bigdata ingestion framework to process multi TB data including dataquality checks, transformation, and stored as efficient storage formats like parquet and loaded into AmazonS3 using SparkScalaAPI and Spark.
- Implement automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoopstreaming, ApacheSpark, SparkSQL, Scala, Hive, and Pig.
- Designed highly efficient data model for optimizing large-scale queries utilizing Hive complex data types and Parquetfile format.
- Performed datavalidation and transformation using Python and Hadoopstreaming.
- Developed highly efficient PigJavaUDFs utilizing advanced concept like Algebraic and Accumulator interface to populate ADP Benchmarks cube metrics.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using SQOOP and load into Hive tables, which are partitioned.
- Developed bash scripts to bring the TLOG filefrom ftp server and then processing it to load into hive tables.
- Automated workflows using shellscripts and Control-M jobs to pull data from various databases into HadoopDataLake.
- Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
- Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs, Scala and have a good experience in using Spark-Shell and SparkStreaming.
- Designed, developed and maintained BigData streaming and batch applications using Storm.
- Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
- Developed OozieWorkflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Sqoop jobs, PIG and Hivescripts were created for data ingestion from relational databases to compare with historical data.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Developed pigscripts to transform the data into structured format and it are automated through Oozie coordinators.
- Used Splunk to captures, indexes and correlates real-time data in a searchable repository from which it can generate reports and alerts.
Environment: Hadoop, HDFS, Spark, Strom, Kafka, Map Reduce, Hive, Pig, Sqoop, Oozie, DB2, Java, Python, Splunk, UNIX Shell Scripting.
Hadoop Admin
Confidential - Philadelphia, PA
Responsibilities:
- Worked on SparkSQL to handle structured data in Hive.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Involved in migrating tables from RDBMS into Hivetables using SQOOP and later generate visualizations using Tableau.
- Worked on complex MapReduce program to analyses data that exists on the cluster.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Creating files and tuned the SQLqueries in Hive utilizing HUE.
- Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
- Created the Hive external tables using Accumulo connector.
- Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
- Created custom SOLRQuery segments to optimize ideal search matching.
- Developed Sparkscripts by using Pythonshell commands.
- Stored the processed results In Data Warehouse, and maintaining data using Hive.
- Worked with Spark eco system using Spark SQL and Scala queries on different formats like Textfile, CSV file.
- Experience in Writing PIG User Define Function and HiveUDFS.
- PigScripts are utilized the Sequence File and HCatalog for better performance.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Used SQOOP to import the data from RDBMS to HDFS to achieve the reliability of data.
- Implemented POC for using APACHEIMPALA for data processing on top of HIVE
- Responsible for managing and reviewing Hadoop log files. Designed and developed data management system using MySQL.
- Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Worked with NoSQL databases like MongoDB in making MongoDB tables to load expansive arrangements of semi structured data.
- Developed Sparkscripts by using Pythonshellcommands as per the requirement.
- Installed Oozieworkflow engine to run multiple Hive and Pigjobs, which run independently with time and data availability.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR.
Environment: HDFS, MapReduce, Storm, Hive, Pig, Sqoop, MongoDB, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, SOLR, GIT, Maven.
Java Developer
Confidential
Responsibilities:
- Design, Development of technical specifications using design patterns and SOAmethodology using UML, Unittest, Integration & System testing.
- Developed and tested the application in RAD development environment and deployed into the WebSphere.
- Migrated the Servlets to the Spring Controllers and developed Spring Interceptors, worked on JSPs, JSTL, and JSP Custom Tags.
- Developed and flexible, scalable, utilizing open source technologies like HibernateORM and SpringFramework.
- Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
- Responsible for writing Struts action classes, Hibernate POJO classes and integrating Struts and Hibernate with spring for processing business needs.
- Struts Tag Libraries and StrutsTilesFramework were used in addition to JSP, HTML, AJAX and CSS in developing the presentation layer.
- Used StrutsValidation Framework for dynamic validation of the user input forms.
- Improved Auto Quote application by designing and developing it using Eclipse, HTML, JSF, Servlets and JavaScript.
- Implemented SpringORM wiring with Hibernate provided access to Oracle 10g RDBMS.
- Developed the user interface with JQuery, JSP, HTML, HTML5, CSS, CSS3 and JavaScript.
- Written JDBC statements, prepared statements and callable statements for various database update, insert, delete operations and for invoking functions, stored procedures, triggers.
- Implemented MVC architecture by using spring to send and receive the data from front-end to business layer.
- Used JSPs, HTML, JavaScript, and CSS for development of the web pages.
- Developed Ajax, JavaScript validation functions for client side validations.
- Developed Stateless Session EJBs to make our functionality available to other applications.
- Extensively worked on writing JUnittest cases for testing the business components developed in spring.
- Used Agile-methodology in Development.
- Used SOAPUI to test the web services and mock response for unit testing web services.
Environment: Java, Hibernate, JSP, JavaScript, Weblogic, Struts, EJB, Oracle 10g, Spring, JDBC, XML, HTML, CSS, JUnit, ANT, CVS, Eclipse, Agile, Test-driven development
Java Developer
Confidential
Responsibilities:
- Involved in the review and analysis of the Functional Specifications, and Requirements Clarification Defects etc.
- Created user-friendly GUI interface and Web pages using HTML and CSS and JSP.
- Developed web components using MVC pattern under Strutsframework.
- Wrote JSPs, Servlets and deployed them on Weblogic Application server.
- Used JSP'sHTML on front end, Servlets as Front Controllers and JavaScript for client side validations.
- Wrote the Hibernate-mappingXML files to define java classes-database tables mapping.
- Developed the UI using JSP, HTML, CSSand AJAX and learned how to implement jQuery, JSP and client & server validations using JavaScript.
- Implemented MVC architecture by using spring to send and receive the data from front-end to business layer.
- Designed, developed and maintained the data layer using JDBC and performed configuration of JAVA Application Framework.
- Extensively used Hibernate in data access layer to access and update information in the database.
- Migrated the Servlets to the Spring Controllers and developed Spring Interceptors, worked on JSPs, JSTL, and JSP Custom Tags.
- The front-end JSP pages were developed using the Struts framework, and were hosted in a J2EE environment on aApachetomcatserver.
- Developed and flexible, scalable, utilizing open source technologies like HibernateORM and Spring Framework.
- Assisting Team-mates in completion of their assigned tasks.
- Participated in Debugfixing and QAreview of the Code before delivering to State.
Environment: HTML, JSP, JavaScript, CSS, Struts, spring, Servlets, Design Patterns, XML, XSD, Hibernate, JUnit, Ant, J-Query, Web Services, Windows