Hadoop/spark Developer Resume
Newark, DE
SUMMARY:
- Having 8+ years of IT experience in Design, Development, Maintenance and Support ofBig Data Applications andJAVA/J2EE.
- Over 4+ years of experience with Bigdata Hadoop core and Eco - system components like HDFS, MR, Yarn, Hive, Impala, Sqoop, Flume, Oozie, Hbase,Zookeeper and Pig.
- Exposure to Spark, Spark Streaming, Spark MLlib, Scala and Creating the Data Frames handled in Spark with Scala.
- Hands on experience in working on Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Experience in using D-Streams, Accumulator,Broadcastvariables, RDD caching for Spark Streaming.
- Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Strong experience and knowledge of real time data analytics using Spark, Kafka and Flume.
- Hands on experience in Capturing data from existing relational databases (Oracle, MySQL, SQL and Teradata) that provide SQL interfaces using Sqoop .
- Hands on experience in Sequence files, RC files, Avro, Parquet, RCFile and JSON Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
- Skilled in developing Java Map Reduce programs using java API and usinghive, pig to perform data analysis, data cleaning and data transformation.
- Worked with join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
- Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
- Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries .
- Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
- Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries.
- Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
- Experience in data workflow scheduler Zoo-Keeper and Oozieto manage Hadoop jobs by Direct Acyclic Graph(DAG) of actions with the control flows.
- Experienced in performance tuning and real time analytics in both relational database and NoSQLdatabase(HBase).
- Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics .
- Experience on Mongo DB, Cassandra and various No-sqldatabases like Hbase, Neon, Redisetc
- Experience in setting up the Hadoop clusters, both in-house and as well as on the cloud.
- Profound experience in working with Cloudera (CDH4 &CDH5 ) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
- Exposure towards simplifying and automating big data integration with graphical tools and wizards that generate native code using Talend.
- Exposure in using build tools like Maven, Sbt.
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box, Black-box.
- Good knowledge of Web/Application Servers like Apache Tomcat, IBM WebSphere and Oracle WebLogic.
- Experience as a java Developer in client/server technologies using J2EE Servlets, JSP, JDBC and SQL.
- Expertise in designing and development enterprise applications for J2EE platform using MVC, JSP, Servlets, JDBC, Web Services, Hibernate and designing Web Applications using HTML5, CSS3, AngularJS, Bootstrap .
- Adept in Agile/Scrum methodology and familiar with SDLC life cycle from requirement analysis to system study, designing, testing, de-bugging, documentation and implementation.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, producing, documentation and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS, Mapreduce, Hadoop, Map Reduce, HDFS, Zookeeper, Hive, YarnPig, Sqoop, Oozie, Flume, Kafka, Spark.
Operating System: Windows, Linux, Unix.
Database Languages: SQL, PL/SQL, Oracle.
Programming languages: Scala, Java.
Databases: IBM DB2, Oracle, SQL Server, MySQL, RDBMS, Hbase, Cassandra.
Frameworks: Spring, Hibernate, JMS.
IDE: Eclipse, IntelliJ.
Tools: TOAD, SQL Developer, ANT, Log4J.
Web Services: WSDL, SOAP, REST.
ETL Tools: Talend ETL, Talend Studio.
Web/App Server: UNIX server, Apache Tomcat, Websphere, Weblogic.
Methodologies: Agile, Waterfall, UML, Design Patterns.
PROFESSIONAL EXPERIENCE:
Confidential - Newark,DE
Hadoop/spark Developer
Responsibilities:
- Developed Spark Applications by using Java,python and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Consumed XML messages using Kafka & processed xml using Spark Streaming to capture UI updates .
- Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
- Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- Implemented Elastic Search on Hive data warehouse platform.
- Good understanding of Cassandraarchitecture, replication strategy, gossip, snitch etc.
- Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Used the Spark DataStax Cassandra Connector to load data to and from Cassandra.
- Experienced in Creating data-models for client data sets, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language(CQL).
- Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes .
- Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
- Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for analysis.
- Experience in using Avro, Parquet, RCFileand JSON file formats, developed UDFs in Hive.
- Develop Autosys job for scheduling.
- Experience working with Apache SOLR for indexing and querying.
- Created custom SOLR Query segments to optimize ideal search matching.
- Worked with Log4j framework for logging debug, info & error data.
- Performed transformations like event joins, filter bot traffic and some pre-aggregations using PIG.
- Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
- Written several Map reduce Jobs using Java API.
- Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Generated various kinds of reports using Power BI and Tableau based on client’s requirements.
- Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
- Worked with Network, database, application and BI teams to ensure data quality and availability.
- Prepare ITSM documents (Implementation Plan, DMIO document, Runbook, PT Metrics) and get signoff from respective teams to implement the code in production.
- Assist in Deployment and provide Technical & Operational support during Install.
- Post implementation support.
- Coordinate with offshore team.
- Review code developed by offshore team and validates the test results.
- Ensure overall quality of all the deliverables within the timelines.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
Environment: Spark, Spark SQL, Cloudera, HDFS, Hive, Apache Kafka, Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux,MySQL Oracle Enterprise DB, Jenkins, Eclipse, Oracle, Git, Oozie, MySQL, Soap, NIFI, Cassandra and Agile Methodologies.
iSIGMA - Norcross, GA
Hadoop/spark Developer
Responsibilities:
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Worked on implementing Spark Framework a Java based Web Frame work.
- Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Experienced in querying data using SparkSQLon top of Spark engine for faster data sets processing.
- Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
- Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis
- Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
- Extracted and restructured the data into MongoDB using import and export command line utility tool.
- Extracted files from MongoDB through FLUME and placed in HDFS and processed.
- Used Amazon DynamoDB to gather and track the event based metrics.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Implemented Custom Sterilizer, interceptors to Mask, created confidential data and filter unwanted records from the event payload in flume.
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Written java code to format XML documents, uploaded them to solr server for indexing.
- Experienced on apache Solr for indexing and load balanced querying to search for specific data in larger datasets and implemented Near Real TimeSolr index on Hbaseand HDFS.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Experience in working with different join patterns and implemented both Map side and Reduce Side Joins.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Imported logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory loading the data from local system to HDFS.
- Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
- Created Partitioned Hive tables and worked on them using HiveQL.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Installed, Configured TalendETL on single and multi-server environments.
- Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Worked on continuous Integration tools Jenkins and automated jar files at end of day.
- Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
- Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to AWS Elastic search.
- Used Impala connectivity from the User Interface(UI) and query the results using Impala.
- Written and Implemented TeradataFast load, Multiload and Bteq scripts,DML and DDL.
- Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Experienced knowledge over designing Restful services using java based API’s like JERSEY.
- Worked in Agile development environment having KANBAN methodology. Actively involved in daily scrum and other design related meetings.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
Environment: Hadoop, HDFS, Hive, Map Reduce, AWS Ec2, SOLR, Impala, MySQL, Sqoop, Kafka, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, Linux-Ubuntu, Scala,Ab Initio, Maven, Jenkins, Java (JDK 1.6), Cloudera, JUnit, agile methodologies
Confidential
Hadoop Developer
Responsibilities:
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Also used Spark SQL to handle structured data in Hive.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Written and Implemented TeradataFast load, Multiload and Bteq scripts,DML and DDL.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate particular visualizations using Tableau.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop
- Defined the Accumulo tables and loaded data into tables for near real-time data reports.
- Created the Hive external tables using Accumulo connector.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Used distinctive data formats (Text format and ORC format) while stacking the data into HDFS.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
- Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS
- Imported the data from different sources like AWS S3, LFS into Spark RDD.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Creating files and tuned the SQL queries in Hive utilizing HUE.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Designed Power BI data visualization utilizing cross tabs, maps, scatter plots, pie, bar and density charts.
- Configured and deployed Azure Automation Scripts for a multitude of applications utilizing the Azure stack (Including Compute, Web&Mobile, Blobs, ADF, Resource Groups, AzureData Lake, HDInsight Clusters, Azure Data Factory, AzureSQL, Cloud Services, and ARM), Services and Utilities focusing on Automation.
- Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
- Worked with NoSQL databases like Hbase in making the tables to load expansive arrangements of semi structured data.
- Worked with Kerberos and integrated it to the Hadoop cluster to make it more strong and secure from unauthorized access.
- Acted for bringing in data under HBase using HBase shell also HBase client API.
- Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging,job scheduling and Error Handling
- Developed and designed ETL Jobs usingTalend Integration Suite in Talend 5.2.2
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
Environment: Hadoop, Cloudera, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Hbase, Apache Spark, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, Talend, HUE, HCATALOG, Flume, Solr, Git, Maven.
Confidential - Milwaukee, WI
Hadoop/JavaDeveloper
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Estimated the hardware requirements for NameNode and DataNodes& planning the cluster.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
- Responsible for importing log files from various sources into HDFS using Flume.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats
- Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler andDatabase Tuning Advisor (DTA)
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Created customized BI tool for manager team that perform Query analytics using Hive QL.
- Created Hive Generic UDF's, UDAF's, UDTF's in Java to process business logic that varies based on policy.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Work with business stakeholders, application developers, and DBA's and production
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Assisted in managing and reviewing Hadoop log files.
- Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured)
Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Hive, Pig, Sqoop, Flume, Kafka, Oozie, Hue, Hortonworks, Java, Storm, Zookeeper, Informatica, AVRO Files, SQL, ETL, Cloudera Manager, MySQL, MongoDB.
Confidential
Java/ETL Developer
Responsibilities:
- Prepare Functional Requirement Specification and done coding, bug fixing and support.
- Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, architecture design & development for the project.
- Designed the front-end applications, user interactive (UI) web pages using web technologies like HTML, XHTML, and CSS.
- Implemented GUI pages by using JSP, JSTL, HTML, XHTML, CSS, JavaScript, AJAX
- Involved in creation of a queue manager in WebSphere MQ along with the necessary WebSphere MQ objects required for use with WebSphere Data Interchange.
- Developed SOAP based Web Services for Integrating with the Enterprise Information System Tier.
- Use ANT scripts to automate application build and deployment processes.
- Involved in design, development and Modification of PL/SQL stored procedures, functions, packages and triggers to implement business rules into the application.
- Used Struts MVC architecture and SOA to structure the project module logic.
- Developed ETL processes to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Have good Informatica ETL development experience in an offshore and onsite model and involved in ETL Code reviews and testing ETL processes.
- Scheduling the sessions to extract, transform and load data in to warehouse database on Business requirements.
- Struts MVC framework for developing J2EE based web application.
- Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features.
- Designed an entire messaging interface and Message Topics using WebLogic JMS.
- Implemented the online application using Core Java, JDBC, JSP, Servlets, Spring, Hibernate, Web Services, SOAP, and WSDL.
- Migrated datasource passwords to encrypted passwords using Vault tool in all the JBoss application servers.
- Used Spring Framework for Dependency injection and integrated with the Hibernate framework.
- Developed Session Beans which encapsulates the workflow logic.
- Used JMS (Java Messaging Service) for asynchronous communication between different modules.
- Developed web components using JSP, Servlets and JDBC.
Environment: Java, J2EE, JDBC, Servlets, HTML, XHTML, CSS, JavaScript, Ajax, Javascript,MVC, Informatica, ETL, PL/SQL, Struts 1.1, Spring, JSP, JMS, JBoss 4.0, SQL Server 2000,Ant, CVS, PL/SQL, Hibernate, Eclipse, Linux
Confidential
Java/J2EE Developer
Responsibilities:
- Developed the J2EE application based on the Service Oriented Architecture by employing
- SOAP and other tools for data exchanges and updates.
- Developed the functionalities using Agile Methodology.
- Used Apache Maven for project management and building the application.
- Used Restful API and SOAP web services for internal and external consumption.
- Used Spring ORM module for integration with Hibernate for persistence layer.
- Involved in writing Hibernate Query Language (HQL) for persistence layer.
- Used Spring MVC, Spring AOP, Spring IOC, Spring Transaction and Oracle to create Club Systems Component.
- Wrote backend jobs based on Core Java & Oracle Data Base to be run daily/weekly.
- Coding the core modules of the application compliant with the Java/J2EE coding standards and Design Patterns.
- Written Java Script, HTML, CSS, Servlets, and JSP for designing GUI of the application.
- Worked on Service-side and Middle-tier technologies, extracting catching strategies/solutions.
- Design data access layer using Data Access Layer J2EE patterns, Implementing the MVC architecture Struts Framework for handling databases across multiple locations and displayinformation in presentation layer.
- Used XPath for parsing the XML elements as part of business logic processing.
