Hadoop Engineer Resume
Boston, MA
SUMMARY:
- Working as an Architect in Big Data solutions that involved Cloudera, Microsoft Azure HDInsight’s.
- Cloudera Certified Developer for Apache Hadoop with 16+ Years of IT experience in Development, Design, Application Support & Maintenance and managing projects of IT applications.
- Experience in Planning and Defining Scope, Developing Schedules, Budgeting, Cost estimations, Team Leadership, Monitoring and Reporting Progress
- Experience in JAVA patterns by using Open Source products.
- Experience in solutions involving end to end using Hadoop HDFS, Map/Reduce, Strom, Solr, Kafka, Scala, Pig, Hive, HBase, Sqoop, Oozie and Zookeeper and performance tuning the Hadoop cluster.
- Experience in programming python on Spark.
- Experience in real time streaming using Kafka and Storm (POC).
- Experience in Installing and configuring and upgrading Hadoop cluster using Cloudera Manger.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experience working with large databases like Oracle, MySQL and DB2.
- Experience in data migration and data modelling using N0 - SQL databases.
- Knowledge in Data analytics using R Lang, Apache Hadoop, Map-Reduce.
- Knowledge in Data Analysis techniques like clustering, classification, regression, forecasting and prediction using R Lang.
- Knowledge on BI and data warehouse processes and techniques.
- Expert in working with Multi-threaded applications using VC++ and C++ in Windows, Unix and Solaris Environment
- Experience in various activities of Agile Methodology UMF, UML and Design patterns
- Experience in various phases of Software Development such as Study, Analysis, Development, Testing, Implementation and Maintenance of a Real time systems.
- Experience in Managing and leading the projects with Onsite/Offshore model
TECHNICAL SKILLS:
Programming Languages: Java, Python, R Lang, Map reduce, Pig-Latin, Spark, Scala, C++, VC++, C# .Net 4.0.
Tools: MS Project, Visual Studio, CVS, Sqoop, Oozie, zookeeper, Spark, Storm, Kafka, AWS. PowerShell.
Frame works: Hadoop HDFS (Apache, Cloudera, Hortonworks), Cassandra, Microsoft Azure HDInsights, MVC.
Database: My SQL, Oracle, Teradata, Hive, HBase, MongoDB, Sybase, DB2.
Operating Systems: Windows, UNIX, Linux, Solaris
Theoretical Knowledge: Flume, Solr, Microstatergy.
PROFESSIONAL EXPERIENCE:
Confidential, Boston, MA
Hadoop Engineer
Responsibilities:
- Architecting and designing the pipeline of the data.
- Interacting with data modelers and creating the refined data scripts.
- Curation through Pig and scrubbing of data and loading to the cloud environment.
- Creating Hive External tables to Data scientists for analytics.
- Involved in optimization of data and performance tuning of hive queries on spark.
- Developing and resolving queries on Spark cluster using Python.
- Supporting visualization team to create dash boards on QlikView.
Environment:: Microsoft Azure HDInsights on Windows and Linux, Spark, Python, Hive, BLOB Storage, ETL (Informatica), QlikView, PowerShell, Pig.
Confidential, New York
Developer /Architect
Responsibilities:
- Architecting and designing the project from start.
- Interacting with data scientists and gathering requirements.
- Installation of Spark cluster and performance tuning.
- Developing and integrating application on Spark (RDD’s) using Hadoop Cluster.
- Coding and implementation of Hive tables using monthly partitions.
- Importing data from Oracle using Sqoop and other files through FTP.
- Developing and scripting in Python.
- Developing and running scripts on Linux production environment.
- Reporting and visualizations using Business objects. ( Tableau)
Environment: Cloudera 5.3, RHEL, R, Sqoop, SparkR, Oozie, Hive, python, BO, Kafka, Spark streaming.
Confidential
Developer /Architect
Responsibilities:
- Analyze Barclays enterprise architecture
- Solution proposal for migrating from Teradata to Hadoop platform
- Design the actual solution
- Analyze tools and technologies required and support provided by existing software’s latest versions and its support for Hadoop HDFS.
- Propose for technological upgrades
- Hadoop cluster sizing
- Develop a solution for end to end data flow from Oracle to HDFS, access data available in HDFS from Informatica, push/pull of data available in Teradata into and from HDFS
Environment:: Java 1.7, Cloudera 4, Sqoop 1.4.3, Hive 0.96, RHEL, Eclipse Helios, Informatica 9.5, Teradata, Oozie, Zookeeper, Oracle.
Confidential
Developer /Architect
Responsibilities:
- Responsible for performing in-depth analysis and conceptualization of Retail Banking Customer 360 degree view.
- Responsible for creating use cases/functional requirements.
- Responsible for designing the user interface.
- Responsible for creating entity model design and user interface creation using AppBuilder.
- Responsible for the overall solution delivery developed using InfoSphere Data Explorer.
- Created the estimation and work breakdown structure for the solution.
- Planned the development and helped the team resolve technical issues.
- Worked with Data Explorer product development team to resolve technical issues and identify solutions.
- Published solution offering document for this solution.
- Responsible to take business requirements to process credit card historical data from Banking SME.
- Responsible to perform loading of credit card historical data into Hadoop.
- Responsible to design and develop Map/Reduce programs for analytics purpose to process the credit card historical data and generate output which is further indexed using API for visualization purposes.
- Involved in setting up 50 node Hadoop cluster for executing the solution
Environment: Hortonworks, DB2 9.7, Linux, Hive, HBase, Zookeeper.
Confidential
Project Lead
Responsibilities:
- Developed classes for end-to-end framework for interacting with Hadoop.
- Worked on Hive Queries to accomplish insert and update on data through joins and functions including UDFs.
- Participated extensively in the design and development of the project.
Environment: Apache Hadoop, Hive, Java.
Confidential
Developer / Project Lead
Responsibilities:
- De-duplicate incoming data using MapReduce written in Java
- Write transformation queries in Hive
- Write UDF in Hive
Environment: Hadoop, Hive, Amazon Web Services, Java.
Confidential - Dallas, TX
Project Lead
Responsibilities:
- Managing and leading the project team
- Detailed project planning and controlling
- Managing project deliverables in line with the project plan
- Coach, mentor and lead personnel within a technical team environment
- Recording and managing project issues and escalating where necessary
- Monitoring project progress and performance
- Providing status reports to the project sponsor
- Managing project training within the defined budget
Environment:: Java, UNIX.
Confidential
Project Lead
Responsibilities:
- Enhancing the application
- Developing and maintaining a detailed project plan
- Understanding SRS, Functional Specification documents
- Design Document, Coding and Test Cases Preparation
- Client interaction, status reporting to manager and above
- Liaison with, and updates progress to, project steering board/senior management.
- Managing project evaluation and dissemination activities
Environment: VC++, MYSQL, Windows xp
Confidential
Project Lead
Responsibilities:
- Involved in R&D of interfacing the Excel in Python
- Understanding SRS, Functional Specification documents
Environment: Python
Confidential
Responsibilities:
- Analysis
- Understanding SRS, Functional Specification documents
- Design and coding of the product
Environment: Visual C++ 2005
Confidential
Team Lead
Responsibilities:
- Analysis
- Understanding SRS, Functional Specification documents
- Design and coding of the product
Environment: Visual C++ 6.0, Win CVS, Oracle 10g.
Confidential
Developer /Team Lead
Responsibilities:
- Involved in the interaction with the customer for study and analysis of the requirements of the system.
- Responsible for Analysis and Design of the modules using RequisitePro and Rational Rose. This involved highly interactive displays, handy programming interface to co-ordinate amongst multiple threads of execution, structured exception-handling techniques for handling errors.
- Handling the Critical Analysis Module, this deals with real time results.
- Responsible for testing the functionality of devices at the customer site.
- Responsible for developing various modules using VC++ in Windows NT environment. Integrated the modules with the project using Visual Source Safe 5.0.
- Involved in Integrating and Integration testing of total system.
- Tested the modules developed using the real time software simulators and also in real time environment with all the devices, using the tools such as Rational Purify and Quantify.
- Involved in designing the architecture of the overall system.
- Involved in the designing & developing of the communication mechanism, which deals with real time background process
Environment: Windows NT, Solaris. Visual C++ 6.0, VSS, PSos (RTOS), VC++