Hadoop Bigdata Developer Resume
SUMMARY
- 5 Years and 9 Months of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
- Having 4+ years of experience as Big Data Application Designer (following agile methodology) and 2 year of experience as PL/SQL developer.
- Having 4+ years of experience in End - to-end in Big Data Application Designing with strong experience on major components ofHadoopEcosystem likeApache HadoopMap Reduce, Strom, Spark, HDFS, HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Ambari, Kafka, Python, Scala.
- 3+ Years of Experience in NoSQL databases HBase & MongoDB and search engines like Solr and Elastic search.
- Anchor artifacts for multiple milestone (application design, code development, testing, and deployment) in software lifecycle.
- Expertise in creating an application using Apache Strom using java and Scala programming language.
- Skillful in creating Apache Spark program using Python (Pyspak) to build a topology and well versed in using Datasets and dataframes.
- Expertise in Apache Hive queries, partition and bucketing and Optimization.
- Worked on NoSQL Databases like HBase and real time analytics like Kafka.
- Worked in paring the alarm data and store it in solr search engine and query solr effectively based on the requirement.
- Experience in creating data pipeline to move data from RDBMS to HDFS/HDFS to RDBMS using sqoop for improved Business Intelligence and Reports.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that runHadoop Map/Reduce and Pig jobs.
- Created statistical reports in Banana dashboard using the data in Apache Solr search.
- Developed Apache Spark program using python language(PYSPARK) to establish the establish a connection between Mongo DB and software applications.
- Accumulate the EEIM Alarm data to the NoSQL database - Mongo DB and retrieve it from Mongo DB when necessary.
- Experience in processing the system logs using logstash tool and store to elastic search and create dashboard using Kibana.
- Having experience in developing Apache spark, hive and python codes in Apache zeppelin tool.
- Strong knowledge in Telecommunication terminologies and functionalities.
- Familiar with Extracting, Transforming and Loading data into the Hadoop Clusters.
- Expertise in developing Pig scripts to create the relationship between the data present in the Hadoop cluster.
- Developed Apache MapReduce Programs in Ruby to map the data to the production environment.
- Having very good understanding of HDFS architecture and YARN architecture.
- Strong Knowledge of clinical information systems/EMRs/EHRs and health care files like PGF, HL7, 837 and claims.
- Expertise in Rubyand Shell scripting.
- Experience in developing efficient solutions to analyze large data sets.
- In-depth understanding ofData Structureand Algorithms.
- Expert in Excel including Pivot tables, Vlookup, charts and manipulating
- Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server
- Extensive experience working inOracle DB2 SQLServer andMy SQLdatabase.
- Worked in Apache Spark, Python, Elastic search and data visualization tools like Kibana.
- Experienced with Jira, Bit Bucket and source control systems like GiT and SVN and development tools like Jenkins, artifactory.
- Expertise in Development, deployment and Managing Hadoop clusters using distributions like Cloudera(CDH4), Horon Works(HDP 2.3.0, HDP 2.6.0)
- Having hardware design and development knowledge on internet of things(IOT)
- Worked on implementing data lake and responsible for data management in Data lake.
- Proficient in creation/modification of database objects likeStored Procedures, Functions, Packages and Triggers using SQL, PL/SQL and T-SQL
- Create the objects like tables, synonyms, sequences, constraints and views.
- Can-do attitude in problem solving, proactive thinking, analytical, programming and communication skills.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Spark, Strom, HDFS, HBase, Cassandra, Mongo DB Zookeeper, Hive, Pig, Sqoop, Flume, Kafka, Oozie, Logstash and Zepplin
Operating Systems: Windows, UNIX, LINUX, MAC.
Programming Languages: C++, Java, Scala, Python, Oracle PL/SQL, Ruby
Scripting Languages: JavaScript, Shell Scripting
Web Technologies: HTML, XHTML, XML, CSS, JavaScript, JSON, SOAP, WSDL.
Hadoop Distribution: Hortonworks,Cloudera.
Java/J2EE Technologies: Java, J2EE, JDBC.
Database: Oracle, MS Access, MySQL, SQL, No SQL(Hbase, MongoDB).
IDE/Tools: Eclipse, IntellIj, SBT, DBeaver, Datagrip, SQL Developer, TOAD
Methodologies: J2EE Design patterns, Scrum, Agile, Water Flow
Version Control: SVN, Git, GitHub, BITBUCKET
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Bigdata Developer
Responsibilities:
- Anchor artifacts for multiple milestone (application design, code development, testing, and deployment) in software lifecycle.
- Develop Apache Strom program to consume the Alarms in real time streaming from Kafka and enrich the alarm and pass it to EEIM (End To End Incident Management) Application.
- Creating rules Engine in Apache Strom and Apache Spark to categorize the alarm into Detection, Interrogation & Association types before processing of alarms.
- Develop Apache Spark program using python language(PYSPARK) to establish the establish a connection between Mongo DB and EEIM application.
- Analyze the Alarms and enhance the EEIM Application using Apache Strom to predict the root cause of the alarm and exact device where the network failure is happened.
- Responsible to develop the scripts using R programming with H2O Package to send data to Data Scientist team to train the system for machine learning.
- Accumulate the EEIM Alarm data to the NoSQL database called Mongo DB and retrieve it from Mongo DB when necessary.
- Build Fiber To The Neighborhood or Node(FTTN) Topology and Fiber To The Premises(FTTP) Topology using Apache Spark and Apache Hive.
- Categorize the real time streaming Alarms into Matched Alarm, Unmatched Alarm and Unparsed alarm and store it into Apache Solr search engine.
- Responsible to develop EEIM Application as Apache Maven project and commit to code to GIT.
- Design EEIM applications which can send and receive data from the IBM Data Science tool.
- Review the performance of the system and reevaluate the platform by doing the complete system regression test with heavy load of data and capture the logs and metrics of performance review
- Apply the concepts of Deep Learning with the historic data and Develop the application for Machine Learning Module by using R programming Language.
- Process the system logs using logstash tool and store to elastic search and create dashboard using Kibana.
- Create statistical reports in Banana dashboard using the data in Apache Solr search.
- Provide the technical support for debugging, code fix, platform issues, missing data points, unreliable data source connections and big data transit issues.
- Review the unit, integration, system, regression test results of Data Pipelines in the development environment and provide GO or NO GO for the system to Production.
- Conducting code reviews on regular basis or on ad-hoc / on-demand when Confidential amp;T deems necessary.
- Creation of simulation tools and data sets for unit and integration testing.
- Provide the advanced simplistic approaches by researching the data using Machine Learning and Deep Leaning Techniques.
- Provide analytics on most failed equipment in topology using the H2O analytical tool and build a dashboard.
- Experienced with Jira, Bit Bucket and source control systems like GiT and SVN and development tools like Jenkins, artifactory.
Confidential
Hadoop Bigdata Developer
Responsibilities:
- Developed workflows for complete end to end ETL process starting with getting data into HDFS, validating and applying business logic, storing clean data in hive external tables, exporting data from hive to RDBMS sources for reporting and escalating and data quality issues.
- Working as onsite coordinator and providing technical assistance, troubleshooting and alternative development solutions.
- Handled importing of data from various data sources performed transformations using Spark and loaded data into hive.
- Involved in performance tuning of Hive (ORC table)for design, storage, and query perspectives.
- Developing and deploying using Horton works HDP 2.3.0 in production and HDP 2.6.0 in the development environment.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems and vice-versa.
- Worked in developing Pig scripts to create the relationship between the data present in the Hadoop cluster.
- Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of spark using Python and Scala.
- Worked on implementing data lake and responsible for data management in Data lake.
- Developed Ruby Script to map the data to the production environment.
- Experience in analyzing data using Hive, HBase and custom Map Reduce program.
- DevelopedHive UDFsandPig UDFsusing Python script.
- Experienced in working with IBM Data Science tool and responsible for injecting the processed data to IBM Data Science tool.
- Strong Knowledge of clinical information systems/EMRs/EHRs and health care files like PGF, HL7, 837 and claims
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Worked on Oozie Workflow Engine in running workflow jobs with actions that runHadoopMap/Reduce and Pig jobs.
- Responsible to configure the cluster in IBM cloud and maintain the number of nodes as per requirement.
- Developed Kafka consumer to consume data from Kafka topics.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Responsible for optimization of data-ingestion, data-processing, and data-analytics.
- Expertise is developing Pyspark application which build connection between HDFS and HBase and allows data transfer between them.
- Worked on RDBMS likeOracle DB2 SQLServer andMy SQLdatabase.
- Developed workflows to cleanse and transform raw data into useful information to load it to a Kafka Queue to be loaded into HDFS and noSQL database.
- Responsible to do sanity testing of the system once the code is deployed in production.
- Experienced in using IDEs like Eclipse and intelij to modify the code in GIt.
- Involved in quality assurance of the data mapped into production.
- Involved in code walk through, reviewing, testing and bug fixing.
Environment: Hadoop, MapReduce, HDFS, Sqoop, flume, kafka, Hive, Pig, HBase Eclipse, DBeaver, Datagrip, SQL Developer, intellij, GiT, SVN, JIRA, Unix
Confidential
Hadoop Bigdata Developer
Responsibilities:
- Involved in complete project life cycle starting from design discussion to production deployment.
- Worked closely with the business team to gather their requirements and new support features.
- Developed a 16-node cluster in designing the Data Lake with the Cloudera Distribution.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented and configured High Availability Hadoop Cluster.
- Installed and configured Hadoop Clusters with required services (HDFS, Hive, HBase, Spark, Zookeeper).
- Developed Hive scripts to analyze data and PHI are categorized into different segments and promotions are offered to customer based on segments.
- Extensive experience in writing Pig scripts to transform raw data into baseline data.
- Developed UDFs in Java as and when necessary to use in Pig and HIVE queries.
- Worked on Oozie workflow engine for job scheduling.
- Created Hive tables, partitions and loaded the data to analyze using HiveQL queries.
- Created different staging tables like ingestion tables and preparation tables in Hive environment.
- Optimized Hive queries and used Hive on top of Spark engine.
- Worked on Sequence files, Map side joins, Bucketing, Static and Dynamic Partitioning for Hive performance enhancement and storage improvement.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Created tables in HBase to store the variable data formats of data coming from different upstream sources.
- Experience in managing and reviewing Hadoop log files.
- Good understanding of ETL tools and how they can be applied in a Big Data environment.
- Followed Agile Methodologies while working on the project.
- Bug fixing and 24-7 production support for running the processes
Environment: Hadoop, MapReduce, HDFS, Sqoop, flume, kafka, Hive, Pig, HBase, SQL, Shell Scripting, Eclipse, DBeaver, Datagrip, SQL Developer, intellij, GiT, SVN, JIRA, Unix