- 3+ years of experience in Information Technology, this includes proven hands - on experience in Big Data and Analytics.
- 3+ years of comprehensive experience as a Hadoop Developer with focus on Development.
- 3+ years of comprehensive experience in Amazon Web Services
- Expertise in business process modeling using Use Cases, Workflow, Sequence, Structured, Activity, Dataflow, and Process flow Diagrams.
- In depth understanding and usage of Hadoop Architecture frameworks and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and map Reduce concepts.
- Experience in analyzing data using Apache Spark and Hive
- Knowledge of NoSQL databases such as AWS DynamoDB..
- Designing real-time data streaming systems for both synchronization and analysis using frameworks such as Spark streaming and Logstash
- Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving.
- Handled several techno-functional responsibilities including estimate generation, identification of functional and technical gaps, requirements gathering, solution design, development, product documentation, and provision of production support activities.
- A passionate and motivated professional with excellent interpersonal and communication skills, strong business acumen, creative problem solving skills, technical competency, team-player spirit, and leadership skills
Operating Systems: Windows, Linux
Frameworks: Cloudera, Hortonworks,Amazon Web Services
Big Data Technologies: Apache Spark, HDFS, Map Reduce, Hive, Tez, Sqoop, OozieProgramming Languages: Core Java,Scala,SQL
Search Engines:: Elasticsearch,Logstash
Reporting Tools: Tableau,Kibana
Confidential, Washington, DC
Big Data Analyst/Developer
- Developed an electronic data quality validation system for the new Workforce Investment Opportunities Act (WIOA) Cloud Platform Services (CPS) - Workforce Integrated Performance System (WIPS).
- Extensively worked with AWS Elastic MapReduce (EMR) to develop and execute streaming and non-streaming Spark applications which can validate and aggregate data.
- Design and implement the data modernization project pipeline using AWS.
- Head the migration effort from legacy to Hadoop services at the Employment and Training Administration Department.
- Facilitated requirement gathering sessions with the Business Users for the modernization and integration of several grant performance reporting systems.
- Documented Business and Functional requirements specifications for the design of a Cloud Provider Service-Performance Review System(CPS-PRS),
- Provided extensive technical assistance for developing end-user training materials and testing online performance of the Workforce Integrated Performance Systems (WIPS).
- Indexed production and development security log data using Logstash into Elasticsearch.
- Prepared reports using Kibana for the Security team to give insight into unauthorized user access to the production and development cluster.
- Prepared dashboard reports for the stake holders using Tableau which would be useful in monitoring the performance of Workforce Investment Act (WIA) programs.
Confidential, Tempe, AZ
Big Data Analyst/Developer
- Developed statistical models and algorithms using Apache Spark for processing large volume of genomic data.
- Acted as a prime contact between the Bioinformatics Research Group, which included members of the National Cancer Institute and the development teams.
- Conducted rapid software prototyping to demonstrate and evaluate technologies in relevant environments
- Actively participated on teams of software developers, researchers, designers, and technical leads to understand challenges, needs, and possible solutions.
- Decomposed and translated business requirements into functional, non-functional requirements and created System Requirements Specification document for the Expression Quantitative Trait Loci (EQTL) pipeline.
- Generated test plans and test cases from functional requirements, and created a System Quality Assurance plan (incorporating test plans and test cases).
- Handled various types of genomic data formats (BAM and SAM) coming from National Cancer Institute (NCI).
- Prepared custom reports for the biomedical community with specific interests in Tumor simulations.
Confidential, Charlotte, NC
- Responsible for managing data from multiple sources such as Centers for Medicare and Medicaid Services (CMS).
- Develop MapReduce Jobs in Java for data cleaning and pre-processing.
- Extract data from Oracle, PostGreSQL, and Netezza through Sqoop and placed in HDFS and processed.
- Creating Oozie workflows and coordinator jobs for recurrent triggering of Hadoop jobs such as Java map-reduce, Pig, Hive, Sqoop as well as system specific jobs (such as Java programs and shell scripts) by time (frequency) and data availability.