Big Data Developer Resume
Irving-, TX
SUMMARY:
- IT Professional with 5 years of experience in using Java /J2EE technologies along with 3+ years of experience in Hadoop and related Big data technologies.
- Hands on experience in major Big Data components like HDFS, Spark, Hive, MapReduce, Kafka, Python, Sqoop, Oozie and HBase.
- Experienced in working with Cloudera (CDH), Hortonworks (HDP) and Amazon EMR environments.
- Strong experience writing Scala based Spark applications to perform large scale data transformations, data cleansing and other data preparation for advanced analytics.
- Good understanding of Partitioning, Bucketing and Join optimizations in Hive.
- Experience in performance tuning of Map Reduce, Pig jobs and Hive queries.
- Involved in Ingestion of data from various Databases like Teradata, DB2, and SQL - Server using Sqoop.
- Extensive experience in migrating ETL Jobs into Hadoop systems using Sqoop, MR, Pig and Hive.
- Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
- Hands-on experience in using relational databases like Oracle, MySQL, PostgreSQL and MS-SQL Server.
- Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
- Experience with Supervised or Unsupervised Machine Learning algorithms.
- Hands-on experience in training, evaluating and predicting the data as a part of Machine Learning using SparkMLlib, TensorFlow
- Hands on development experience with RDBMS, including writing SQL queries, PLSQL, views, stored procedure, triggers, etc.
- Participated in all Business Intelligence activities related to data warehouse, ETL and report development methodology.
- Highly motivated, dynamic, self-starter with keen interest in emerging technologies.
TECHNICAL SKILLS:
Big data Technologies: Hadoop Ecosystem, Spark, MapReduce, Yarn, Kafka, Pig, Hive, HBase, Flume, Oozie, Kafka, AWS, Elastic MapReduce, Azure, Machine Learning
Operating systems: Windows, UNIX, Linux
Programming Languages: Java, SQL, PL/SQL, Python, Linux shell scripting and REST
RDBMS: Oracle 10g, MySQL, SQL Server, DB2.
NO SQL: HBase, Cassandra, MongoDB.
PROFESSIONAL EXPERIENCE:
Confidential - Irving- TX
Big Data Developer
Responsibilities:
- Used Sqoop to import data into HDFS from Oracle database.
- Detailed analysis of system and application architecture components per functional requirements.
- Worked in Spark to read the data from Hive and write it to HBase.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
- Worked with multiple file formats like Avro, Sequence, Parquet and Orc and worked on reading multiple data formats on HDFS using Scala.
- Converted existing MapReduce programs to Spark Applications for handling semi structured data like JSON files, Apache Log files, and other custom log data.
- Experienced in using Kafka as a data pipeline between producer and Spark Streaming Application (Consumer)
- Tuned Spark/Python code to improve the performance of machine learning algorithms for data analysis.
- Install, configure and administer HDFS, Hive, Ranger, Pig, HBase, Oozie, Sqoop, Spark and Yarn.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Python
- Involved in setting up alerts in Cloudera Manager for the monitoring health and performance of Hadoop Clusters.
- Involved in installing and configuring security authentication using Kerberos security.
- Commission and decommission the data nodes from cluster.
- Involved in installed and configured Apache Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
- Create directories and setup appropriate permissions for different applications or users.
Environment: Hadoop, Spark, HDFS, Hive, Kafka, Scala, Python, Cloudera Manager, Sqoop, Flume, Oozie, CDH5, MongoDB, Cassandra, HBase, Hue, Kerberos and Unix/Linux
Confidential, Dallas, TX
Hadoop Developer
Responsibilities:
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Deployed an Apache Solr search engine server to help speed up the search of the government cultural asset.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
- Setup Hadoop cluster on Amazon EC2 using whirr for POC
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.
Environment: Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Python, Eclipse, Cassandra, Hadoop Distribution of Cloudera, Windows NT, UNIX Shell Scripting, Putty and Eclipse
Confidential, Cupertino,CA
Hadoop Developer
Responsibilities:
- Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and HIVE for data.
- Used IMPALA to read, write and query the Hadoop data in HDFS and configured KAFKA to read and write messages from external programs.
- Used PIG as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Created Stored Procedures to transform the Data and worked extensively in SQL for various needs of the transformations while loading the data.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
Environment: Cloudera, Hadoop, HDFS, Hive, Impala, Spark Sql, Python, Flume1.7.0, Sqoop, Oozie, Storm, Spark, Scala, Zoo-Keeper, MySQL, Shell Scripting
Confidential
Java Developer
Responsibilities:
- Involved in designing and developing modules at both Client and Server Side.
- Developed the UI using JSP, JavaScript and HTML.
- Responsible for validating the data at the client side using JavaScript.
- Interacted with external services to get the user information using SOAP web service calls
- Developed web components using JSP, Servlets and JDBC.
- Technical analysis, design, development and documentation with a focus on implementation and agile development.
- Developed a Web based reporting system with JSP, DAO and Apache Struts-Validator using Struts framework.
- Developed user and technical documentation.
- Developed business objects, request handlers and JSPs for this project using Java Servlets and XML.
- Installed and configured the Apache Web server and deployed JSPs and Servlets in Tomcat Server.
Environment: Java, Servlets, JSP, JavaScript, JDBC, Unix Shell scripting, HTML, Eclipse, Oracle 8i, WebLogic.