- 6 years of experience in Information technology (IT) industry including 2+ years of hands on experience in Big Data ecosystem technologies such as Hadoop, MapReduce, Spark, Hive, HBase, Sqoop, Kafka, Oozie, Cassandra and Flume
- Technically skilled at developing new applications on Hadoop according to business needs and converting existing applications to Hadoop environment.
- Used NIFI for the transformation of data from different components of Big data ecosystem.
- Used Spark Streaming and Kafka to process real time data.
- Worked on writing custom UDF’s in Java to extend Hive core functionality
- Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
- Worked with RDBMS including MySQL and Oracle SQL
- Worked with NoSQL databases including HBase, MongoDB and Cassandra
- Developed simple to complex Map reduce and Streaming jobs using Scala and Java for data cleansing, filtering and data aggregation.
- Extensive hands on experience in most of the programming languages, Java, Python, Scala
- Proficient in writing HiveQL and SQL queries to achieve data manipulation
- Conducted data transformation with data formats like Sequence File, Flat files, XML, JSON, Avro, Parquet and relational tables
- Strong in core Java including Object - Oriented Design (OOD) and Java components like Collections Framework, Exception handling, I/O system
- Adept at using Sqoop to migrate data between RDBMS, NoSQL databases and HDFS
- Developed real-time read/write access to very large datasets via HBase.
- Consolidated MapReduce jobs by implementing Spark.
- Experience with Apache Spark with Scala, Python and Java
- Good knowledge of scheduling batch job workflow using Oozie
- Familiar with developing environments like JIRA, Agile/Scrum and Waterfall
- Experience in collecting, aggregating and moving large amounts of streaming data using Flume, Kafka, Spark Streaming.
- Demonstrated ability to communicate and gather requirements, partner with enterprise architects, business users, analysts and development teams to deliver rapid iterations of complex solutions.
- Proficient in Data Visualization by creating multiple dashboards using Tableau
Hadoop Ecosystem\ Databases: Apache Hadoop 2.5, Hive, Pig, HBase, Sqoop, \ Oracle, MySQL, SQL, MongoDB, Cassandra Spark 1.6, Kafka, Oozie, Zookeeper
Languages\ Visualization: Python, Java, Scala, SQL, R\ Tableau, R
Web Technologies: HTML, CSS
Confidential, Dallas, TX
Big Data Engineer
- Developed data pipeline using Kafka, Sqoop, Hive and Java MapReduce to ingest data into HDFS for analysis.
- Developed design documents considering all possible approaches and identifying best of them.
- Aggregated and stored the data result into HDFS and HBase
- Responsible to manage data coming from different sources
- Developed business logic using Scala
- Responsible for collecting incoming data in real-time and processing them with Spark-Streaming and SparkSQL.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
- Developed scripts and automated data management from end to end and sync up between all the clusters
- Experienced with Spark Context, Spark SQL, Data Frame, Pair RDD's
- Developed functional programs in Scala for connecting the streaming data application and gathering web data.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked in an Agile environment. Effectively communicated with different levels of the management.
Environment: Hadoop 2.5, Hive 1.2, Pig 0.16.0, SPARK 1.6, Scala 2.11.8, MapReduce, HBase 1.1.2, Sqoop 1.4.6, Kafka 0.10.0.1
Confidential, Dallas TX
Big Data Engineer
- To create a production data-lake that can handle transactional processing operations using Hadoop Eco-System.
- Building data Ingestion layer using Spark and Sqoop in distributed cluster.
- Data migration from various relational data platforms to Hadoop and building data warehouse on Hadoop ecosystems such as Hive, Oozie and Sqoop.
- Prepared an ETL pipeline with the help of Sqoop and Hive to be able to frequently bring in data from the source and make it available for consumptions.
- Configured periodic incremental imports of data from Oracle into HDFS using Sqoop.
- Extensive experience in working with structured data using Hive QL , join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Developed Spark jobs and Hive jobs to summarize and transform data.
- Experienced in loading and transforming of large sets of structured data using Spark .
- Involved in gathering requirements from client and estimating a timeline for developing complex queries using Hive for logistics applications
- Created Hive tables, loaded data and Hive queries to analyze user request patterns and implement various performance optimization measures including partitions and bucketing in Hive.
- Setup Oozie workflow for HIVE/Sqoop actions.
- Involved in designing of HDFS storage to have efficient number of block replicas of data.
Environment: Hive 1.2, Sqoop 1.4.6, Hadoop 2.5, Oozie 4.2.0, Spark 1.6, Oracle, Scala 2.11.8
- Implemented scalable applications for information identification, extraction, analysis, retrieval.
- Directed software design and development while remaining focused on client needs.
- Collaborated closely with other team members to plan, design and develop robust
- Interfaced with business analysts, developers and technical support to determine optimal specifications.
- Evaluated interface between hardware and software.
- Advised customers regarding maintenance of diverse software systems.
Environment: Ubuntu Linux, Python, OpenCV, Twilio, Raspberry Pie
- Designed a dynamic and an interactive website that ensured positive customer experience, resulting in 40% increase in revenue.
- Developed, tested and debugged software tools.
- Implemented website functionality using class-based views and models to store data in SQLite database.
- Developed website using Python and Django Web Framework with the help of HTML template tagging, JS and Bootstrap in front end.
- Implemented test programs and evaluated existing engineering processes.
- Designed and configured database and back end application programs.
- Performed research to explore and identify new technological platforms.
- Collaborated with internal teams to convert end user feedback into meaningful and improved solutions.
- Resolved ongoing problems and accurately documented progress of project.
Junior Java Developer
- Participated in requirements analysis and design of documents.
- Involved in development of core modules like ticket reservation, payment, user registration and hotel reservation.
- Developed the application as per the functional requirements from the analysts.
- Integrated SOAP web services and mapped the responses to display to the user interface.
- Involved in designing the entire database for the application.
- Involved in developing persistence layer using JDBC, SQL and stored procedures.
- Used JBoss server to deploy the application to the server.
- Used Subversion (SVN) as version controlling for the source code check in and check outs.
- Participated in scrum meetings as a part of Agile Methodology.