Hadoop/spark Developer Resume
Tampa, FL
PROFESSIONAL SUMMARY:
- Having 8+ years of experience in Information Technology which includes Analysis, Design, Development of Big Data using Hadoop, design and development of web applications using Java, J2EE, Python and data base and data warehousing Technologies.
- Around 5+ years of work experience on Big Data Analytics with hands on experience in major components of Hadoop ecosystem like Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, YARN, Spark, Spark Streaming, Spark SQL, NiFi, Kafka, Impala.
- Experience in various distributions: Cloudera distributions like (CDH 3/CDH4), Map - R distributions, Horton works distributions and Knowledge on Amazon EMR Hadoop distributors.
- Extensive experience working with real time streaming applications and batch style large scale distributed computing applications on integrating Kafka with Apache NiFi and Spark.
- Experience in developing data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
- Experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
- Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
- Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
- Great Understanding of AWS compute services such as EC2, Elastic Map Reduce(EMR), EBS and accessing Instance metadata.
- Good experience in writing re-usable and configurable components as part of project requirements in Java, Scala and Python.
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 instances and S3, configuring the servers for Auto scaling and Elastic load balancing.
- Experience in applying the latest development approaches including applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Experience in writing Spark RDD transformations, actions for the input data and Spark-SQL queries, Data frames to import data from Data sources to perform data transformations, read/write operations using Spark-Core and save the results to output directory into HDFS.
- Experience in Storm for reliable real-time data processing capabilities to Enterprise Hadoop.
- Good experience in scripting for automation, and monitoring using Shell, Python & Perl scripts.
- Good Experience in ETL, Data Integration and Migration, extensively used ETL methodology for supporting Data Extraction, transformations and loading in Hive, Pig and HBase.
- Responsible for developing, support and maintenance for the ETL processes using Informatica PowerCenter.
- Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
- Good knowledge of Scala's functional style programming techniques like Anonymous Functions (Closures), Currying, Higher Order Functions and Pattern Matching.
- Good experience in importing data into HBase using Sqoop and HBase Client API.
- Extending Hive and Pig core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Good Experience in Designing data models in Cassandra and working with Cassandra Query Language (CQL).
- Good Knowledge on Source control repositories like SVN, CVS and GIT. Build tools like SBT, Ant and Maven.
- Adequate knowledge and working experience in Agile and Waterfall methodologies including Scrum methodology .
- Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, MySQL databases.
- Expertise in complete Java Packages, object-oriented design.
- Developed core modules in large cross-platform applications using various technologies like JSP, Servlets, Struts, Hibernate, Spring MVC, JDBC, Spring Boot, JMS, JSF, XML, AJAX, SOAP and RESTful Web Services.
- Good knowledge of Web/Application Servers like Apache Tomcat, IBM WebSphere and Oracle Web Logic.
TECHNICAL SKILLS:
Big Data Eco Systems: HDFS, MapReduce, Hive, YARN, HBase, Pig, Sqoop, Kafka, Storm, Flume, Oozie, Zoo-Keeper, Apache Spark, Apache Tez, Impala, NiFi.
No SQL Databases: HBase, Cassandra, MongoDB
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, JMS, JSP, Servlets, EJB
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10
Web Technologies: HTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic.
Version control: SVN, CVS, GIT
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight
PROFESSIONAL EXPERIENCE:
Confidential, Tampa, FL
Hadoop/Spark Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Worked on loading data into Spark RDD’s, perform advanced procedures like text analytics using in-memory data computation capabilities of Spark to generate the Output response.
- Executed many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
- Handled large datasets using Partitions, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Used Kafka Streams to Configure Spark Streaming to get information and then store it in HDFS.
- Partitioned data streams using Kafka, designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer API's to produce messages.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Performed tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Written Sqoop Scripts for importing and exporting data from RDBMS to HDFS.
- Ingested data from RDBMS to Hive to perform data transformations, and then export the transformed data to Cassandra for data access and analysis.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Created Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Written Spark applications using Scala to interact with the MySQL database using Spark SQL Context and accessed Hive tables using Hive Context.
- Extracted the data from Teradata into HDFS/Dashboards using Spark Streaming.
- Implemented Informatica Procedures and Standards while developing and testing the Informatica objects.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Performed the migration of Hive and MapReduce Jobs from on-premise MapR to AWS cloud using EMR.
- Written the AWS Lambda functions in Scala with cross functionality dependencies which would generate custom libraries for deploying the Lambda function in the Cloud.
Environment: Hadoop YARN, Spark-Core 2.0, Spark-Streaming, Spark-SQL, Scala 2.10.4, Python, Kafka 1.1.0, Hive 2.2.0, Sqoop, Amazon AWS, Oozie, Impala, Cassandra, Cloudera, MySQL, Informatica Power Center 9.6.1, Linux, Zookeeper, AWS EMR, EC2, S3.
Confidential, Austin, TX
Hadoop Developer
Responsibilities:
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Involved in migrating tables from RDBMS into Hive tables using Sqoop and later generate visualizations using Tableau.
- Worked on complex MapReduce program to analyses data that exists on the cluster.
- Worked on Spark SQL to handle structured data in Hive.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Involved in creating Shell scripts to simplify the execution of all other scripts and move the data inside and outside of HDFS.
- Created files and tuned the SQL queries in Hive utilizing HUE (Hadoop User Experience).
- Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
- Created the Hive external tables using Accumulo connector.
- Knowledge in developing NiFi flow prototype for data ingestion in HDFS.
- Managed real-time data processing and real time Data Ingestion in Mongo DB and Hive using Storm.
- Developed Spark scripts by using Python shell commands.
- Stored the processed results In Data Warehouse, and maintaining data using Hive.
- Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR.
Environment: Cloudera, HDFS, MapReduce, Storm, Hive 1.2.0, Pig 0.15.0, SQOOP, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, Git, Maven.
Confidential, San Antonio, TX
Hadoop Developer
Responsibilities:
- Integrated Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
- Created BI reports(Tableau) and dashboards from HDFS data using Hive.
- Preformed importing and exporting the data from Relational Database Systems to HDFS by using Sqoop.
- Developed a common framework to import the data from Teradata to HDFS and to export to Teradata using Sqoop.
- Imported the log data from different servers into HDFS using Flume and developed MapReduce programs for analyzing the data.
- Used Flume to handle the real time log processing for attribution reports.
- Worked on tuning the performance of Pig queries.
- Involved in loading data from UNIX file system to HDFS.
- Performed operation using Partitioning pattern in MapReduce to move records into different categories.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
- Involved in templates and screens in HTML and JavaScript.
- Created HBase tables to load large sets of data coming from UNIX and NoSQL.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Design, develop, test, implement and support of Data Warehousing ETL using Talend and Hadoop Technologies.
- Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes
Environment: WebSphere 6.1, HTML, XML, ANT 1.6, MapReduce, Sqoop, UNIX, NoSQL, Java, JavaScript, MR Unit, Teradata, Node.js, JUnit 3.8, ETL, Talend, HDFS, Hive, HBase.
Confidential
Software Engineer
Responsibilities:
- Involved in various phases of SDLC (Software Development Life Cycle) such as requirements gathering, analysis, design, modeling, and development.
- Involved in developing the presentation layer for the project.
- Designed the application by implementing JSF Framework based on MVC Architecture
- Developed the statistics graph using JSP, Custom tag libraries, Applets and Swing in a multi-threaded architecture
- Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as bill payments to the service providers.
- Experience in Core Java, Collection Framework, JSP, Dependency Injection, Spring MVC, RESTful Web services.
- Developed source code in eclipse oxygen using Java, J2EE, Spring MVC.
- Designed prototype for the project in JSP, Servlets, HTML, CSS and JavaScript.
- Developed UI screens using JSP, JavaScript, jQuery, XHTML, and CSS.
- Implemented various complex SQL queries.
- Working experience and deep knowledge on databases Oracle and MySQL.
- Designed test plans, scenarios, scripts, and procedures.
- Coordinate with onsite group for production issues, development and Testing.
- Involved in writing data extraction mechanism using JDBC.
- Installing and configuring Tomcat
- Written SQL for the data extraction from the My SQL database.
Environment: Core Java, Java, Servlets 2.2, JDBC, JSP 2.0, JSP, jQuery, JavaScript, HTML, CSS, Microsoft SQL server data base, Eclipse 2.0, HTML, MY SQL 4.0, Windows 2000