- 7+ years of IT experience in complete life cycle of software development using Object Oriented analysis and design using Big data Technologies / Hadoop ecosystem, SQL, Java, J2EE technologies.
- Experienced Hadoop Developer has a strong background with file distribution systems in a big data arena. Understands the complex processing needs of big data and have experience developing codes and modules to address those needs.
- Brings Certification as a Developer using Apache Hadoop. Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache and Cloudera.
- Excellent working knowledge of HDFS Filesystem and Hadoop Demons such as Resource Manager, Node Manager, Name Node, Data Node, Secondary Name Node, Containers etc.
- Core Qualifications Good experience with Map Reduce, HDFS, Yarn, Python, Sqoop, HBase, Oozie Hadoop Streaming and Hive.
- Solid understanding of the Hadoop file distributing system Extensive knowledge of ETL including Ab Initio and Informatica Excellent oral and written communication skills Collaborates well across technology groups.
- Experience in setting up Hadoop clusters on cloud platforms like AWS.
- Customized the dashboards and done access management and identity in AWS.
- Work experience with cloud infrastructure such as Amazon Web Services (AWS) EC2 and S3.
- Used Git for source code and version control management.
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
- Monitoring Map Reduce Jobs and Yarn Applications.
- Strong Experience in installing and working on NoSQL databases like HBase, Cassandra.
- Experience in developing customized UDF’s in java to extend Hive.
- Experience using integrated development environment like Eclipse, Net beans, JDeveloper.
- Excellent understanding of relational databases as pertains to application development using several RDBMS including in IBM DB2, Oracle 10g, MS SQL Server 2005/2008 and strong database skills including SQL, Stored Procedure and PL/SQL.
- Vast experience with Scala and Python In - depth understanding of MapReduce and the Hadoop Infrastructure Focuses on the big picture with problem-solving.
- Developed applications using Spark for data processing. Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions for the faster analysis of the data.
Hadoop/Big Data: HDFS, Spark, MapReduce, Pig, Hive, Sqoop, ORC, AWS, Avro, Talen Bigdata, Oozie
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC
IDE s: Eclipse, Net beans, WSAD, Oracle SQL Developer, IBM Rapid Application Developer (RAD)
Frameworks: MVC, Struts, JPA
Languages: C, Java, Scala Python, Linux shell scripts, SQL, PL/SQL
Databases: Oracle, MySQL, DB2
Web Servers: JBoss, Web Logic, Web Sphere, Apache Tomcat
Reporting Tools: OBIEE, Micro Strategy
ETL Tools: Informatica ETL
Data model and design tools: Microsoft Visio and Erwin
Sr Big Data Developer
Confidential, Baltimore, MD
- Contributing to the development of key data integration and advanced analytics solutions leveraging Apache Hadoop and other big data technologies for leading organizations using major Hadoop Distributions like Hortonworks and Cloudera.
- Working on Amazon AWS - EMR, EC2, RDS, S3, RedShift, etc., Tools- Hadoop, Hive, Pig, Sqoop, Oozie, HBase, Flume, Spark.
- Working on loading log data directly into HDFS using Flume in Cloudera - CDH. Involve in loading data from LINUX file system to HDFS in Cloudera - CDH.
- Experience in running Hadoop streaming jobs to process terabytes of xml format data. Experience in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP in Cloudera.
- Installed and configured MapReduce, HIVE and the HDFS. Developing Spark scripts by using Java per the requirement to read/write JSON files. Working on Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on Hadoop Administration, development, NoSQL in Cloudera Load and transform large sets of structured, semi structured and unstructured data.
- Involve in creating Hive tables, loading with data and writing hive queries which will run internally in map. Automate all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Create reports for the BI team using Sqoop to export data into HDFS and Hive, Configure and install Hadoop and Hadoop ecosystems (Hive/Pig/ HBase/ Sqoop/ Flume).
- Designed and implemented a distributed data storage system based on HBase and HDFS. Importing and exporting data into HDFS and Hive.
- Design & Implement Data Warehouse creating facts and dimension tables and loading them using Informatica Power Center Tools fetching data from the OLTP system to the Analytics Data Warehouse. Coordinating with business user to gather the new requirements and working with existing issues, worked on reading multiple data formats on HDFS using Scala. Loading data into parquet files by applying transformation using Impala. Executing parameterized Hive, impala, and UNIX batches in Production.
- Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala, Analyzed the SQL scripts and designed the solution to implement using Scala.
- Involved in Investigating any issues that would come up. Experienced with solving issues by conducting Root Cause Analysis, Incident Management & Problem Management processes.
- Developed multiple POC’s using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Design and Development of Integration APIs using various Data Structure concepts, Java Collection Framework along with exception handling mechanism to return response within 500ms. Usage of Java Thread concept to handle concurrent request.
Environment: Hadoop 1.x/2.x MR1, Cloudera CDH3U6, HDFS, Spark, Scala, Impala, HBase 0.90.x, Flume 0.9.3, Java, Sqoop 2.x, Hive 0.7.1, Tableau (Online, Desktop, Public Viable).
Sr Hadoop Developer
Confidential, San Diego, CA
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Developed custom Input Adaptor utilizing the HDFS File system API to ingest click stream log files from FTP server to HDFS.
- Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
- Implemented Spark and utilized Spark SQL heavily for faster development, and processing of data.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's.
- Involved in converting Hive/SQL queries into Spark transformations using Spark with Scala.
- Used Scala collection framework to store and process the complex consumer information.
- Implemented a prototype to perform Real time streaming the data using Spark Streaming with Kafka.
- Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Analyzed the data by performing Hive queries (Hive SQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created components like Hive UDF’s for missing functionality in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Created validate and maintain scripts to load data using Sqoop manually.
- Created Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS ( AWS cloud ) using Sqoop and Flume .
- Used Oozie and Oozie coordinators to deploy end-to-end data processing pipelines and scheduling the workflows.
- Continuous monitoring and managing the Hadoop cluster
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Experience with data wrangling and creating workable datasets.
Environment: HDFS, Pig, Hive, Sqoop, Flume, Spark, Scala, MapReduce, Scala, Oozie, Oracle 11g, Yarn, UNIX Shell Scripting, Agile Methodology
Confidential, Saint Louis - MO
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.
- Implemented real time analytics pipeline using Confluent Kafka, storm, elastic search, Splunk and green plum.
- Design and develop Informatica BDE Application and Hive Queries to ingest Landing Raw zone and transform the data with business logic to refined zone and to Green plum data marts for reporting layer for consumption through Tableau.
- Installed, configured, and maintained big data technologies and systems. Maintained documentation and troubleshooting playbooks.
- Automated the installation and maintenance of Kafka, storm, Zookeeper and elastic search using salt stack technology.
- Developed connectors for elastic search and green plum for data transfer from a Kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
- Responded to and resolved access and performance issues. Used Spark API over Hadoop to perform analytics on data in Hive.
- Exploring with Spark improving the performance and optimization of the existing algorithms Hadoop using Spark Context, Spark-SQL, Data Frame, Spark, Yarn.
- Imported and exporting data into HDFS and Hive using Sqoop & Developed POC on Apache-Spark and Kafka. Proactively monitored performance, Assisted in capacity planning.
- Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using Sqoop.
- Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
- Good understanding of performance tuning with NoSQL, Kafka, Storm and SQL Technologies.
- Design/Develop framework to leverage platform capabilities using MapReduce, Hive UDF’s.
- Worked on data transformation pipelines like Storm. Worked with operational analytics and log management using ELK and Splunk. Assisted teams with SQL and MPP databases such as Green plum.
Worked on Salt Stack automation tools. Helped teams working with batch-processing and tools in Hadoop technology stack (MapReduce, Yarn, Pig, Hive, HDFS)
Environment: Java, Confluent Kafka, HDFS, Storm, Elastic Search, Salt Scripting, Green plum, k-streams, k-tables, Splunk, Hadoop.
Confidential , Seattle, WA
- Involved in loading data from UNIX file system to HDFS. Imported and exported data into HDFS and Hive using Sqoop.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Responsible for building scalable distributed data solutions using Hadoop. Worked hands on with ETL process.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Handled Imported data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extracted the data from Teradata into HDFS using Sqoop. Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
- Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive. Developed Hive queries to process the data and generate the data cubes for visualizing.
Environment: Hive, pig, Apache Hadoop, Cassandra, Sqoop, Big Data, HBase, Zookeeper, Cloudera, CentOS, NoSQL, sencha ext js, java script, ajax, Hibernate, Jms, web logic Application server, Eclipse, Web services, azure, Project Server, Unix, Windows.
- Interacted with the business users in gathering the requirements
- Analyzed the requirements and converted them into code.
- Prepared various documents related to requirement analysis.
- Developed user interfaces as per the business requirements.
- Developed DAO layer methods which connect/interact with the database.
- Developed batch jobs for various business needs and deployed them on to multiple environments.
- Prepared deployment documents for the application and deployed it in multiple environments including development, test and production.
- Developed unit test cases for the code written and made sure that code is completely well covered.
- Worked on multiple applications with in the same application suite and worked on business user stories of all of them.
- Interacted with the end users as and when they report application related issues while using them in the production environment.
- Fixed application level issues raised by the end users while using the application in production.
Environment: Java, J2EE, EJB, Shell Scripting, UNIX, Oracle, WebSphere Application Server, Tivoli Workload Scheduler.
- Worked closely with management to identify the requirements and prepared various kinds of documents.
- Written technical specifications of the product using Microsoft Visio.
- Created unit test case suite using JUnit.
- Developed desktop application using Swings, AWT to read screen coordinates to capture images effectively.
- Executed the test suite against the product developed.
- Deployed the product on to different environments.
- Fixed the issues reported by users across the organization.
- Gave presentations of the product and introduced it to a large audience across the organization.
Environment: Windows, Java, AWT, Swings.