We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • 8+ years of overall experience in data analysis, data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming and Advanced Analytics.
  • Excellent understanding of Hadoop Architecture and Deamons such as HDFS, NameNode, DataNode, Job Tracker, Task Tracker.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduceProgramming, Hive, Pig, Sqoop, HBase, Impala, Solr, ElasticSearch, Oozie, Zoo Keeper, Kafka, Spark, Cassandra with Cloudera and Hortonworksdistribution.
  • Created custom Solr Query segments to optimize ideal search matching.
  • Used Solr Search & MongoDB for querying and storing data.
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Involved in converting Cassandra/Hive/SQL queries into Sparktransformations using RDDs, and Scala.
  • Analyzed the Cassandra/SQLscripts and designed the solution to implement using Scala.
  • Expertise in Big Data Technologies and Hadoop Ecosystem tools like Flume, Sqoop, Hbase, ZooKeeper, Oozie, MapReduce, Hive, PIG and YARN.
  • Extracted and updated the data into MONGODB using MONGO import and export command line utility interface.
  • Developed Collections in MongoDB and performed aggregations on the collections.
  • Hands on experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Clusterusing Cloudera and Horton works distributions.
  • In - depth Knowledge of DataStructures, Design and Analysis of Algorithms.
  • Good understanding of DataMining and MachineLearning techniques.
  • Hands on experience in various Hadoop distributionsIBM Big Insights, Cloudera, Horton works andMapR.
  • Scheduled various ETL process and Hive scripts by developing Oozie workflows.
  • Experience in migrating ETL process into Hadoop, Designing Hive data model and wrote PigLatinscripts to load data into Hadoop.
  • Experienced in using Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Expertise in using Custom loader functions in PiggyBank.
  • Experience in handling various file formats like AVRO, Sequential, Parquet etc.
  • Expertise in Implementing complex business logic by writing GenericUDF's and HiveUDF.
  • Good understanding of MPP databases such Impala and Created tables and writing Queries in IMPALA and GreenPlum.
  • Strong Knowledge in InformaticaETL Tool, Data warehousing and Business intelligence. knowledge on Web/Application Servers like Apache Tomcat, IBM WebSphere and Oracle WebLogic.
  • Good Knowledge on Object Oriented Analysis and Design(OOAD) and Java Design patterns.
  • In-depth understanding of Spark Architecture including SparkCore, SparkSQL, Data Frames, Spark Streaming, Spark MLlib.
  • Expertise in writing Spark RDD transformations, actions, Data Frames, caseclasses for the required input data and performed the data transformations using Spark-Core.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Expertise in developing Real-Time Streaming Solutions using SparkStreaming .
  • Proficient in big data ingestion and streaming tools like Flume, Sqoop, Spark, Kafka and Storm.
  • Hands on experience in developing Map Reduce programs using ApacheHadoop for analyzing the Big Data.
  • Expertise in implementing ad-hoc Map Reduce programs using Pig Scripts.
  • Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
  • Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming the data using flumeinterceptors.
  • Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Knowledge in handling Kafka cluster and created several topologies to support real-time processing requirements.
  • Hands on Experience in writing SQL and PL/SQL queries.
  • Ability to work with Onsite and Offshore Teams.
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall.
  • Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box,Black-box.
  • Hands on experience migrating complex map reduce programs into Apache SparkRDD transformations.
  • Design and Programming experience in developing Internet Applications using Java, J2EE, JSP,MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX and web based development tools.
  • Experience with Enterprise Java Beans (EJB) components Technical expertise & demonstrated high standards of skills in J2EE frameworks like Struts (MVC Framework).
  • Proficient in using various IDEs like RAD, Eclipse, IntelliJ, SBTand JDeveloper.
  • Extensive experience in programming, deploying, configuring middle-tier popular J2EE Application Servers like Bea WebLogic 8.1,IBM WebSphere 5.0, open source Apache Tomcat andJBoss Servers.
  • Good experience in client web technologies like HTML, CSS, JavaScript and AJAX, Servlets, JSP, JSON, XML, JSF andAWS.
  • Experience in Software Development Life Cycle (SDLC),OOA, OOD and OOP through implementation and testing.
  • Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
  • Good at working on low-level design documents and System Specifications.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Methodology: Agile, waterfall

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS

Development / BuilTools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Sr. Hadoop/Spark Developer

Responsibilities:

  • Experience in working with Hadoop clusters using Hortonworks distributions.
  • Experienced in NOSQL databases like HBase, MongoDB and experienced with Hortonworks distribution of Hadoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed job processing scripts using Oozie workflow.
  • Worked with Nifi for managing the flow of data from source to HDFS.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Configured Sparkstreaming to get ongoing information from the Kafka and stored the stream information to HDFS.
  • Consumed JSON messages using Kafka and processed the JSON file using Spark Streaming to capture UI updates.
  • Experienced in writing live Real-time Processing and core jobusing Spark Streaming withKafka as a data pipe-line system.
  • Optimized HiveQL/ pig scripts by using execution engine like Tez,Spark.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications on Pig and Hive jobs.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2and S3for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Used DataStaxSpark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of spark with Hive and SQL/Oracle.
  • Performed Scala, Data transformations in Scala, HIVE and used partitions, buckets for performance improvements.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
  • Imported data from AWSS3into Spark RDD, Performed transformations andactionson RDD's.
  • Used Solr Search &MongoDB for querying and storing data.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Practical experience in defining queries on JSON data using Query DSL provided by ElasticSearch.
  • Experience in improving the search focus and quality in ElasticSearch by using aggregations.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
  • Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
  • Created Cassandra tables to store various data formats of data coming from different sources.
  • Helped the team to increase the Cluster size from Nodes.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Wrote Hive Queries and UDF's.
  • Good in using version control like GITHUB and SVN.
  • Hands-on building tool Maven and continuous integration like Jenkins.
  • Strong knowledge in using different utility tools like Eclipse, Intellij, SBT.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Gained experience in managing and reviewing Hadoop log files.

Environment: Hadoop, MapReduce, Sqoop, HDFS, HBase, Hive, Pig, Oozie, Spark, Kafka, Cassandra, AWS, Elasticsearch, Java, Oracle 10g, MySQL, Ubuntu, HDP.

Confidential, New York City, NY

BigData Engineer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Hbase and Sqoop.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Implemented different machine learning techniques using Weka machine learning library.
  • Developed Spark program to analyze reports using Machine Learning models.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Performed analysis on implementing Spark using Scalaandwrotespark sample programs using PySpark.
  • Good exposure in development with HTML, Bootstrap, Scala
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
  • Experienced in implementing static and dynamicpartitioning in hive.
  • Experience in customizing map reduce framework at different levels like input formats, data types and partitions.
  • Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume.
  • Implemented API tool to handle streaming data using Flume.
  • Created Oozie workflows to automate data ingestion using Sqoop and process incremental log data ingested by Flume using Pig.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Involved in migrating hive queries and UDF’s in hive to Spark SQL.
  • Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports andcreatedSqoop jobs for last saved value.
  • Developed custom writable MapReduceJAVA programs to load web server logs into HBase using Flume.
  • Responsible for Data Modeling as per our requirement in HBase and for managing and scheduling Jobs on a Hadoop cluster using Oozie jobs.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Involved in data migration from Oracle database to Mongo DB.
  • Involved in migrating tables from RDBMS into Hive tables using Sqoop and later generate visualizations using Tableau.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappyetc. in Hive tables.
  • Creating HBase tables for random read/writes by the map reduce programs.
  • Creating Hive tables, loading with data and writing Hive queries which will run internally in Map Reduce way
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Designed the ETL process and created the high-level design document including the logical dataflows, source data extraction process,database staging, job scheduling and Error Handling.
  • Developed and designed ETL Jobs usingTalend Integration Suite in Talend 5.2.2
  • Created ETLMapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.

Environment: Hadoop, Cloudera (CDH 4), HDFS, Hive, HBase, Flume, Sqoop, Pig, Kafka Java, Eclipse, Teradata, Tableau, Talend, MongoDB, Ubuntu, UNIX, and Maven.

Confidential, Fort Lauderdale, FL

Hadoop Developer

Responsibilities:

  • Worked with terabytes of structured and unstructureddata (240 TB with replication factor coming in from multiple web sources).
  • Designed and developed entire pipeline from data ingestion to reporting tables.
  • Scrutinized Hadoop Log Files, executed performance scripts.
  • Used Cloudera Manager to monitor and manage Hadoop Cluster.
  • Worked on streaming log data into HDFS from web servers using Flume.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Extracted data from Oracle database, Teradata database into HDFSusing Sqoop.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS
  • Involved in loading data from LINUX file system to HDFS Importing and exporting data into HDFS and Hive using Sqoop.
  • Good understanding of Amazon web services like Elastic MapReduce (EMR), EC2.
  • Working knowledge of MapR and Teradata unison to optimize high availability (HA).
  • Installed, configured and participated in Hadoop MapReduce, Pig, Hive, Oozie and Sqoop environment.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), UserDefined Table- Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Experienced in running Hadoop streaming jobs to process terabytes of JSON format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Performed data cleaning, integration, transformation,reduction by developing MapReduce jobs in javafor data mining.
  • Importing and exporting data into MapR-FS from various web servers.
  • Creating Hive tables, loading data into it and customizing hive queries, internally operating in MapReduce way.
  • Performed Map-Side joins and Reduce-Side joins for large tables.
  • Defining a schema, creating new relations, performing Pig-Join, sorting and filtering using Pig-Group on large data sets.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Oozie, Cloudera CDH4.5, Cloudera CDH5.1.3, SQL, LINUX, Java, J2EE, Web services, PostgreSQL, DB2.

Confidential

Java Developer

Responsibilities:

  • Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
  • Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
  • Conducted analysis, requirements study and design according to various design patterns and developed rendering to the use cases, taking ownership of the features.
  • Used various design patterns such as Command, Abstract Factory,Factoryand Singleton to improve the system performance. Analyzing the critical coding defects and developing solutions.
  • Developed configurable front end using Struts technology. Also involved in component based development of certain features which were reusable across modules.
  • Designed, developed and maintained the data layer using the ORM framework called Hibernate.
  • Used Hibernate framework for Persistence layer, involved in writing Stored Proceduresfor data retrieval and data storage and updates in Oracle database using Hibernate.
  • Developed batch jobs which will run on specified time to implement certain logic in java platform.
  • Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
  • Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle
  • Responsible for developing SQL Queries required for the JDBC.
  • Designed the database and worked on DB2 and executed DDLS and DMLS.
  • Active participation in architecture framework design and coding and test plan development.
  • Strictly followed Water Fall development methodologies for implementing projects.
  • Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
  • Involved in developing training presentations for developers (off shore support), QA, Production support.
  • Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.

Environment: -Java JDK (1.5), Java J2EE, Informatica, Oracle 11g (TOAD and SQL developer) Servlets, Jboss application Server,Water Fall, JSPs, EJBs, DB2, RAD, XML, Web Server, JUNIT, Hibernate, MS ACCESS, Microsoft Excel.

Confidential

Java/J2EE Developer

Responsibilities:

  • Implemented several design patterns like Observer pattern,factory pattern, singleton pattern, facade pattern etc.
  • Utilized Agile Methodologies to manage full life-cycle development of the project.
  • Interacting with the Business Analyst and Host to understating the requirements using the Agile methodologies and SCRUM meeting to keep track and optimizing the end client needs.
  • Created Use Case Diagrams, Class Diagrams, Activity Diagrams during the design phase.
  • Developed the project using MVC Design pattern.
  • Designed and Developed Server-side Components (DAO, Session Beans) Using J2EE.
  • Worked with Core Java concepts like Collections Framework, multithreading, memory management.
  • Used JDBC connectivity and JDBC statements, Prepared Statements, Callable Statements for querying, inserting, updating, deleting data from Oracle databases.
  • Developed Front-end Screens using HTML, CSS, and JavaScript.
  • Developed Date Time Picker using Object Oriented JavaScript extensively.
  • Code reviews and re-factoring was done during the development and check list is strictly adhered during development.
  • Used JENKINS for continuous Integration . used Subversion as a version control system for the application.
  • Used Log4j for logging purposes and Tracing the code.
  • Client side Validations are done using JavaScript.
  • Optimized XML parsers like SAX and DOM for the production data.
  • Have good understanding of Teradata MPP architecture such as Partitioning and Primary Indexes.
  • Good knowledge in Teradata Unity, Teradata Data Mover, OS PDE Kernel internals, Backup and Recovery.
  • Implemented the JMS Topic to receive the input in the form of XML and parsed them through a common XSD.
  • Used JDBC Connections and WebSphere Connection pool for database access.
  • Developed and modified several Database Procedures, Triggers and views to implement the business logic for the application.
  • TOAD is used to monitor the turnaround times of queries and to test all the connections.
  • Prepared the test plans and executed test cases for unit, integration and system testing.
  • Developed multiple unit and integrations tests using Mockito and Easy Mock.
  • Used JIRA for reporting bugs in the application.

Environment: -Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JMS, JavaScript, XSLT, HTML,CSS, SAX, DOM, XML, UML, TOAD, Mockito, Oracle, Eclipse RCP, JIRA, WebSphere, Unix/Windows.

Confidential

Junior Java Developer

Responsibilities:

  • Extensive Involvement in Requirement Analysis and system implementation.
  • Actively involved in SDLC phases like Analysis, Design and Development.
  • Responsible for developing modules and assist in deployment as per the client’s requirements.
  • Application is implemented using JSP and servlets are used for implementing Business logic.
  • Developed utility and helper classes and Server side Functionalities using servlets.
  • Created DAO Classes and Written Various SQL queries to perform DML Operations on the data as per the requirements.
  • Created Custom Exceptions and implemented Exception handling using Try, Catch and Finally Blocks.
  • Developed user interface using JSP, JavaScript and CSS Technologies.
  • Implemented User Session tracking in JSP.
  • Involved in Designing DB Schema for the application.
  • Implemented Complex SQL Queries, Reusable Triggers, Functions, Stored procedures using PL/SQL.
  • Worked in pair programming, Code reviewing and Debugging.
  • Involved in Tool development, Testing and Bug Fixing.
  • Performed unit testing for various modules.
  • Involved in UAT and production deployments and support activities.

Environment: - Java, J2EE, Servlets, JSP, SQL,PL/SQL,HTML,JavaScript,CSS, Eclipse, Oracle, MYSQL, IBM Websphere,JIRA.

We'd love your feedback!