- Around 4 years of IT experience as Spark/Hadoop Developer & Java Developer.
- 1 - year experience working as a Graduate Research Assistant at Purdue University.
- Experience working on Hadoop ecosystem, Hive, Pig, Zookeeper, Sqoop, Oozie and Apache Spark.
- Experienced in developing applications in Scala and Java.
- Hands-on experience importing and exporting data from RDBMS to HDFS using Sqoop.
- Experienced in Apache Spark, Spark RDDs, Spark Core, Spark SQL.
- Hands on knowledge on Core Java, Springboot, Hibernate frameworks.
- Have familiarity in Hadoop distributions like Cloudera and Hortonworks.
- Knowledge on spinning up EMR cluster and running Hadoop, Spark jobs.
- Knowledge on AWS services RDS, EMR, EC2, S3.
- Knowledge on Kafka, Spark Streaming.
- Worked in teams following Agile methodologies.
Languages Scala, Python, Java.: Frameworks & Tools: Hadoop, MapReduce, Sqoop, Pig, Hive, Apache Spark, Kafka, Springboot Hibernate, AWS EMR, S3, EC2, RDS.
Big Data environments Cloudera, Hortonworks: Databases: Oracle, MySQL.
IDEs IntelliJ, Eclipse, NetBeans, Anaconda.: Other tools: Microsoft Visio, UML, Git, Github, Pivotal Tracker, Putty.
Confidential, Bentonville, AR.
Big Data Developer
- Involved in developing end to end Spark analytical applications for business insights.
- Involved in requirement gathering and designing the solution.
- Worked on getting structured data from Teradata to Hive, HDFS using TDCH.
- Worked on validating data after import.
- Created Oozie workflows to trigger Spark and MLP jobs.
- Developed Java application to trigger Oozie workflows and monitor the jobs’ status.
- Implemented Hibernate to log job details in MySQL.
- Developed Python scripts for coordination with different tools in the organization.
- Worked on optimizing Hive and Spark scripts.
- Worked on converting Hive queries to Spark using Python.
- Worked on Spark optimizations and memory management.
- Used Git for version control.
- Determining the viability of a business problem for a Big Data solution.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Imported millions of structured data records from relational databases using Sqoop import to process.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Developed mappings using data processor transformation to load data from word, pdf documents to HDFS.
- Solved performance issues in Hive scripts with understanding of joins, group and aggregations.
- Fetching the HQL results into CSV files and handover to reporting team.
- Collaborated with team engineers to produce high quality code using Agile software development.
- Worked on POC for streaming data using Kafka and Spark Streaming.
- Created and maintained technical document of the life cycle to present at closure.
- Used Git for version control.
- Worked on a live 30 node Hadoop cluster running Cloudera.
- Experienced with through hands-on experience in Hadoop, Java, SQL and Python.
- Importing and exporting data into HDFS from Oracle database and vice versa using Sqoop.
- Used Flume to collect, aggregate, and store the web log data from different web servers, network devices and pushed to HDFS.
- Load and transform large sets of semi-structured and unstructured data that includes Sequence files and XML files and worked on Avro and Parquet file formats using compression techniques like Snappy, Gzip and Zlib.
- Experienced working on processing unstructured data using Pig and Hive.
- Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
- Experienced in writing the Map Reduce programs for analyzing of data as per the business requirements.
- Experienced in using Hive and Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
- Evaluated Oozie for workflow orchestration in the automation of MapReduce jobs, Pig and Hive jobs.
- Developed Junit test cases to validate the results of analysis of MapReduce.
- Experienced in managing and reviewing Hadoop log files.
- Involved in analyzing system failures, identifying root causes, and recommended course of actions.
- Involved in design, development and analysis documents in sharing with Clients.
- Analysis and Design of the Object models using JAVA/J2EE Design Patterns in various tiers of the application.
- Analyzing the Client Requirements and designing the specification document based on the requirements.
- Applied J2EE design patterns like business delegate, DAO and Singleton.
- Worked with Maven build tool to build the Project.
- Written SQL queries, PL/SQL and stored procedures as part of database interaction.
- Used dispatch action to group related actions into a single class.
- Testing and production support of core java based multithreading ETL tool for distributed loading XML data into Oracle10g database using JPA/Hibernate.
- Utilized frameworks such as Hibernate and Spring for persistence and Application Layers.
- Attached an SMTP server to the system, which handles Dynamic E-Mail Dispatches.
- Defined and Developed Action and Model Classes.
- Used Spring Framework and created the Dependency injection for the Action classes using ApplicationContext.xml.
- Configured and Deployed application on Tomcat Application Server.
- Used Log4j to implement logging facilities. Used Git for version control.