Sr.spark/hadoop Developer Resume
Omaha, NE
SUMMARY
- 8 years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
- Highly dedicated and results oriented Hadoop Developer with 4+ years of strong end - to-end experience on Hadoop Development with varying level of expertise around different BIGDATA Environment projects.
- Expertise in core Hadoop and Hadoop technology stack which includes HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, HBase, Spark, Kafka, and Zookeeper.
- Having experience on RDD architecture and implementing spark operations on RDD and also optimizing transformations and actions in spark.
- Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in installation and setup of various Kafkaproducers and consumers along with the Kafka brokers and topics.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Experienced in managing Hadoop cluster using Cloudera Manager Tool.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Java MapReduce and Pig jobs.
- Great hands on experience withPysparkfor using Spark libiries by using python scripting for data analysis.
- Implemented data science algorithms like shift detection in critical data points using Spark, doubling the performance.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Extending Hive and Pig core functionality by writing customUDFs.
- Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Experience in Apache Flume for efficiently collecting, aggregating, and moving large amounts of log data.
- Involved in developing web-services using REST, HBase Native API Client to query data from HBase.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
- Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDD's in Scala and Python.
- Used highly available AWS Environment to launch the applications in different regions and implemented Cloud Front with AWS Lambda to reduce latency.
- Implemented CRUD operations using CQL on top of Cassandra file system.
- Used Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
- Set up Solr for distributing indexing and search
- Used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandrakey spaces.
- Excellent working Knowledge in Spark Core, Spark SQL, Spark Streaming.
- Real time exposure to Amazon Web Services, AWS command line interface, and AWS datapipeline.
- Work experience with cloud infrastructure like Amazon Web Services (AWS).
- Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera(CDH4/CDH5), Hortonworks and good knowledge on MAPR distribution, IBM Big Insights and Amazon’s EMR (Elastic MapReduce).
- Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
- Expertise in developing responsive Front-End components with JavaScript, JSP, HTML, XHTML, Servlets, Ajax, and AngularJS.
- Experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, Servlets, JSP, JSF, EJB, JDBC and SQL.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Good Understanding in Apache Hue.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Good in using version control like GITHUB and SVN.
TECHNICAL SKILLS
Hadoop Distribution: Horton works, Cloudera (CDH3, CDH4, CDH5), Apache, Amazon AWS(EMR),MapR and Azure.
Hadoop Data Services: Hadoop HDFS, Map Reduce, Yarn,HIVE, PIG, Pentaho, HBase, ZooKeeper, Sqoop, Oozie, Cassandra, Spark, Scala, Storm, Flume, Kafka and Avro,Parquet,Snappy,Nifi.
Hadoop Operational Services: Zookeeper, Oozie
NO SQL Databases: HBase, Cassandra, MongoDB, Neo4j, Redis
Cloud Services: Amazon AWS
Languages: SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting, HTML, XML(XSD,XSLT,DTD) C, C++, Java, JavaScript Python, Scala
ETL Tools: Informatica, IBM DataStage, Talend
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JSP, JDBC, EJB
Application Servers: Web Logic, Web Sphere, Tomcat.
Databases: Oracle,MySQL,DB2, Teradata, MS SQL Server,SQL/NOSQL,HBase,Cassandra,Neo4j
Operating Systems: UNIX, Windows, iOS, LINUX
Methodologies: Agile(Scrum), Waterfall
Other Tools: Putty, WinSCP, Stream Weaver.
PROFESSIONAL EXPERIENCE
Confidential, Omaha, NE
Sr.Spark/Hadoop Developer
Responsibilities:
- Extensively migrated existing architecture to Spark Streaming to process the live streaming data.
- Responsible forSparkCore configuration based on type of Input Source.
- Executed Spark code using Scala forSpark Streaming/SQL for faster processing of data.
- Performed SQL Joins among Hive tables to get input for Spark batch process.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Developed Python code to gather the data from HBase and designs the solution to implement usingPySpark.
- DevelopedPySparkcode to mimic the transformations performed in the on-premise environment.
- Analyzed the sql scripts and designed solutions to implement using pyspark. created custom new columns depending up on the use case while ingesting the data into hadoop lake using pyspark.
- Developed environmental search engine using JAVA, Apache SOLR and MYSQL.
- Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
- Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster
- Involved in converting Hive/Sql queries into Spark transformations using Spark RDD’s.
- Loading data from Linux file system to HDFS and vice-versa
- Developed UDF’s using both Data Frames/Sql and RDD in Spark For data Aggregation queries and reverting back into OLTP through sqoop.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing theHadoopcluster through Cloudera Manager.
- Continuously monitored and managed theHadoopCluster using Cloudera Manager.
- Participated in development/implementation of Cloudera impalaHadoopenvironment.
- Utilized ApacheHadoopenvironment by Cloudera.
- Collect the data using SparkStreaming and dump into Cassandra Cluster
- Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
- Extensively use Zookeeper as job scheduler for Spark Jobs.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Wrote Java code to format XML documents; upload them toSolrserver for indexing.
- Used AWS to export MapReduce jobs into Spark RDD transformations.
- Writing AWS Terraform templates for any automation requirements in AWS services.
- Used Spark API over Hortonworks HadoopYARN to perform analytics on data in Hive.
- Deploy and configured cloud AWS EC2 for client websites moving from self-hosted services forscalability purposes.
- Work with multiple teams to provision AWSinfrastructure for development and productionenvironments.
- Experience in designing Kafka for multi data center cluster and monitoring it.
- Designed number of partitions and replication factor for Kafka topics based on business requirements.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Experience on Kafka and Spark integration for real time data processing.
- Developed Kafka producer and consumer components for real time data processing.
- Hands-on experience for setting up Kafka mirror maker for data replication across the cluster’s.
- Experience in Configure, Design, Implement and monitor Kafka Cluster and connectors.
- Oracle SQL tuning using explain plan.
- Manipulate, serialize, model data in multiple forms like JSON, XML.
- Involved in setting up map reduce 1 and map reduce 2.
- Prepared Avro schema files for generating Hive tables.
- Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
- Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
- Involved in Analysis, Design, System architectural design, Process interfaces design, design, documentation.
- Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
- Involved in Cassandra Data modelling to create key spaces and tables in multi Data Center DSE Cassandra DB.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
Environment: Cloudara, spark, imapala, sqoop,flume,Cassandra,kafka,hivezooKeeper,oozie,RDBMS,AWS.
Confidential, San Francisco, CA
Sr. Spark/Hadoop Developer
Responsibilities:
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Responsible to manage data coming from different sources.
- Developed Batch Processing jobs using Pig and Hive.
- Involved in gathering the business requirements from the Business Partners and Subject Matter Experts.
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented Elastic Search on Hive data warehouse platform.
- Good experience in analyzing Hadoop cluster and different analytic tools like Pig, Impala.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Experienced in managing andreviewingHadooplog files.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Experienced in working with spark eco system using Sparksql and Scala queries on different formats like Text file, CSV file.
- Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
- Implemented Name Node backup using NFS. This was done for High availability.
- Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2).
- Responsible for building scalable distributed data solutions usingHadoopcluster environment with Horton works distribution.
- Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.ni
- Troubleshooting, Manage and review data backups, Manage and reviewHadooplog files. Hortonworks Cluster.
- Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
- Ingested streaming data with Apache NiFi into Kafka.
- Worked with Nifi for managing the flow of data from sources through automated data flow.
- Designed and implemented the MongoDB schema.
- Wrote services to store and retrieve user data from the MongoDB for the application on devices.
- Used Mongoose API to access the MongoDB from NodeJS.
- Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive, using Talend Tool.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Written the shell scripts to monitor the data ofHadoopdaemon services and respond accordingly to any warning or failure conditions.
Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, MongoDB, Talend, Shell Scripts, Eclipse, WinSCP, Hortonworks.
Confidential, San Diego, CA
Hadoop Developer
Responsibilities:
- Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
- Hands on experience in loading data from UNIX file system to HDFS.
- Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
- Managing theHadoopdistribution with Cloudera Manager, Cloudera Navigator, Hue.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Developedmultiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Cluster coordination services through Zookeeper.
- Designed and implemented Hive queries and functions for evaluation, filtering, loading and storing of data.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Expertise in Partitions, Bucketing concepts in Hive and analyzed the data using the HiveQL
- Installed and configured Flume, Hive, PIG, Sqoop and Oozie on the Hadoop cluster.
- Involved in creating Hive tables, loading data and running Hive queries in those data.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Loading the data to HBase Using Pig, Hive and Java API's.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured.
- Experienced with performing CURD operations in HBase.
- Collected and aggregated large amounts of web log data from different sources such as web servers, mobile using Apache Flume and stored the data into HDFS/HBase for analysis.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Involved in writing optimized PIG Script along with involved in developing and testing PIG Latin Scripts.
- Created Map Reduce programs for some refined queries on big data.
- Working knowledge in writing PIG's Load and Store functions.
Environment: Apache Hadoop 1.0.1, MapReduce, HDFS, CentOS, Zookeeper, Sqoop, Cassandra, Hive, PIG, Oozie, Java, Eclipse, Amazon EC2, JSP, Servlets.
Confidential, Dallas, TX
Java/Hadoop Developer
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
- Participated in requirement gathering and converting the requirements into technical specifications.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema and Amazon Web services for the data maintenance and structures.
- Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI,Pig and hive.
- Selecting the appropriate AWS service based upon data, compute, system requirements.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Experience in creating integration between Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
- Got good experience with NOSQL database like MongoDB.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database.
- Used Hibernate ORM framework with spring framework for data persistence and transaction management.
- Used struts validation framework for form level validation.
- Wrote test cases in JUnit for unit testing of classes.
- Involved in templates and screens in HTML and JavaScript.
- Involved in integrating Web Services using WSDL and UDDI.
- Suggested latest upgrades for Hadoop clusters.
- Created HBase tables to load large sets of data coming from UNIX and NoSQL
- Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes
Environment: JDK 1.5, J2EE 1.4, Struts 1.3, JSP, Servlets 2.5, WebSphere 6.1, HTML, XML, ANT 1.6, JavaScript, Node.js, JUnit 3.8, HDFS, MongoDB, Hive, HBase UNIX, AWS
Confidential
Java Developer
Responsibilities:
- Played an active role in the team by interacting with welfare business analyst/program specialists and converted business requirements into system requirements.
- Developed and deployed UI layer logics of sites using JSP.
- Struts (MVC) is used for implementation of business model logic.
- Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
- Developed dynamic JSP pages with Struts.
- Developed the XML data object to generate the PDF documents and other reports.
- Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
- Messaging and interaction of Web Services is done using SOAP and REST
- Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios
- Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
Environment: J2EE, JDBC, Java 1.4, Servlets, JSP, Struts, Hibernate, Web services, RESTfull services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic 8.0, XML, Junit, Oracle 10g, My Eclipse.
Confidential
Java Developer
Responsibilities:
- Implemented the project according to the Software Development Life Cycle (SDLC)
- Implemented JDBC for mapping an object-oriented domain model to a traditional relational database
- Created Stored Procedures to manipulate the database and to apply the business logic according to the user’s specifications
- Developed the Generic Classes, which includes the frequently used functionality, so that it can be reusable
- Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions
- Designed and developed user interfaces using JSP, JavaScript and HTML
- Involved in Database design and developing SQL Queries, stored procedures on MySQL
- Used CVS for maintaining the Source Code
- Logging was done through log4j
Environment: JAVA, Java Script, HTML, log4j, JDBC Drivers, Soap Web Services, UNIX, Shell scripting, SQL Server.