- Around 8 years of extensive IT experience in all phases of Software Development Life Cycle (SDLC), including 4+ years of strong experience working on Apache Hadoop ecosystem and Apache Spark.
- Hadoop Stack
- Worked extensively with Hadoop Distributions like Cloudera, Hortonworks.
- In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
- Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
- Experience in ingesting data from FTP/SFTP servers using Flume.
- Experience in developing Kafka Consumer API using Spark Scala applications.
- Data Processing
- Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
- Experience in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
- Experience in developing Hive UDF’s and running hive scripts using different execution engines like Tez and Spark (Hive on Spark).
- Experience in designing tables and views for reporting using Impala.
- Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API’s.
- Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
- Work Flows
- Rich experience in automating Sqoop and Hive queries using Oozie workflow.
- Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab.
- Cloud Infrastructure
- Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates.
- File Formats
- Experienced in working with different file formats - Avro, Parquet,RC and ORC.
- Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2.
Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala,Hue,Sqoop,Kafka,Oozie,Flume,Zookeeper, Spark, Cloudera and Hortonworks
Hadoop Paradigms: Map Reduce, YARN, In-memory computing, High Availability, Real-time Streaming
Programming Languages: SQL, Java, J2EE, Scala and Unix shell scripting
Databases& NoSQL: Oracle, Teradata, MySQL, SQL Server, DB2, Familiar with NoSQL- HBase
Cloud Components: AWS (S3 Buckets, EMR, Ec2, Cloud Formation), Azure (Sql Database & Data Factory)
Other Tools: Eclipse, IntelliJ, SVN, GitHub, Jira.
Confidential - Plano, TX
- Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Developed Spark API to import data into HDFS from Teradata and created Hive tables.
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive,Hive on Spark and some through Spark SQL.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Developed Spark core and Spark SQL scripts using Scalafor faster data processing.
- Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Integrated Hive and Tableau Desktop reports and published to Tableau Server.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
- Used Jira for bug tracking and SVN to check-in and checkout code changes.
- Continuous monitoring and managing the Hadoopcluster through Cloudera Manager.
Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQLUNIX Shell Scripting, Cloudera.
Confidential - SFO, CA
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Worked with different source data file formats like JSON, CSV, TSV etc.
- Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
- Import and export data between the environments like MySQL, HDFS and deploying into productions.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
- Experience in Oozie workflow scheduler template to managevarious jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
- Involved in importing and exporting data from HBase using Spark.
- Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
- Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Apache Hadoop, MapReduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, Java, Oozie, Spark,Oracle, MySQL, Netezza and UNIX Shell Scripting.
Confidential - Milwaukee, WI
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
- Imported Datasets with Sqoop from different sources like Oracle, MySQL to HDFS and Hive respectively on daily basis.
- Installed and configured Hive on Hadoop cluster.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Developing and running MapReduce jobs on YARN and Hadoop cluster to produce daily and Monthly reports as per business requirements.
- Scheduling and managing jobs on Hadoop cluster using Oozie work flow.
- Experienced in developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
- Developed Hive Views for requirement analysis and created Hive tables to store the processed data .
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation
- Performed Data transformations in Hive and used partitions, buckets for performance improvements.
- Utilized cluster co-ordination services through ZooKeeper.
- Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with business users
Environment: MapReduce, Java, Hadoop, Cloudera, Pig, Hive, Oozie, Sqoop, Oracle, ZooKeeper & Eclipse and UNIX Shell Scripting.
- Participated in the requirement analysis and design of the application using UML/Rational Rose and Agile methodology.
- Involved in developed the application using Core Java, J2EE and JSP's.
- Worked to develop this Web based application entitled EMR in J2EE framework which uses Hibernate for persistence, Spring for Dependency Injection and Junit for testing.
- IntegratedREST APIwith Spring for consuming resources usingSpring Rest Templatesand developedRESTfulweb services interface to Java-based runtime engine and accounts.
- Used JSP to develop the front-end screens of the application.
- Built the admin module using Struts framework for the master configuration.
- Used Struts tiles to display the front-end pages in a neat and efficient way.
- Designed and developed several SQL Scripts, Stored Procedures, Packages and Triggers for the Database.
- Developed nightly batch jobs which involved interfacing with external third party state agencies.
- Test scripts for performance and accessibility testing of the application are developed.
- Responsible for deploying the application in client UAT environment.
- Prepared installation documents of the software, including Program Installation Guide and Installation Verification Document.
- Involved in different types of testing like Unit, System, Integration testing etc. is carried out during the testing phase.
- Provided production support to maintain the application.
Environment: Java, J2EE, Struts Frame work, JSP, Spring Framework, Hibernate, Oracle, My Eclipse,PL/SQL, WebSphereUML, Toad, Windows.
Jr. Java Developer
- Involved in Designing and Coding.
- Used RAD to develop, test and deploy all the Java components.
- Develop (Specify, create, modify, maintain, and test) software component(s) which are part of the software project on assigned technology platform.
- Correct complicated defects and make major enhancements to resolve customer problems.
- Developed Presentation Screens using Struts view tags.
- Developing scalable applications in a dynamic environment, primarily using Java, Spring, webservices and object/relationship mapping tools.
- Working in both UNIX and Windows environments.
- Developing or modifying databases as needed to support application development, and continually providing support for internally developed applications.
- Developing technical architecture documentation based upon business requirements.
- Enhancing and maintaining existing application suite.
- Communicating development status on a regular basis to technology team members
Environment: Java Servlets, J2EE, Spring, Struts, Hibernate, Eclipse IDE, RAD, JDBC, Web Services, SQL, HTML, DHTML, XSLT, Oracle, SOAP, Oracle, Agile(Scrum) and CSS