Hadoop /data Analyst Resume
Hartford, CT
SUMMARY
- 7+ years of software development experience wif 2.5 years of experience in Hadoop and big data related technologies.
- Excellent noledge on internal working of HDFS filesystem, Mapreduce
- In - depth noledge on Hadoop ecosystem components like: Pig, Hive, Sqoop, Flume, Oozie, Zookeeper.
- Experience of enterprise Hadoop distributions of Cloudera and Hortonworks.
- Built, deployed and managed 155-node Hadoop Cluster based on Cloudera's Distribution of Hadoop
- Experience in configuring High Availability for Cloudera Manager 5 and its services
- Strong noledge in configuring NameNode Quorum-based high availability wif Automatic Failover using Zookeeper, and NameNode federation
- Experience in monitoring and managing large scale multi-node production Hadoop cluster using CDH4 and CDH5
- Experience in performing minor and major upgrades, commissioning and decommissioning of nodes on Hadoop cluster
- Experience in managing Hadoop processes using Init scripts and manually
- Experience in HDFS and Mapreduce maintenance tasks involving adding a datanode/tasktracker, checking filesystem integrity using fsck, balancing HDFS block data
- Strong noledge of Apache Hive and Pig administration and deployment
- Experience in working on production support and maintenance related projects
- Hands on experience in data mining process, implementing complex business logic and optimizing teh query using HiveQL. Controlling teh data distribution by partitioning and bucketing techniques to enhance performance.
- Solid experience in Pig administration and development, experience in writing Pig UDFs(Eval, Filter, Load and Store) and macros
- Experience in embedding Hive and Pig in Java
- Experience in using HCatalog for Hive, Pig and Hbase
- Exposure to NoSQL databases Hbase and Cassandra
- Worked on developing ETL processes to load data into HDFS using tools like Flume and Sqoop, perform structural modifications using Mapreduce, Hive and analyze data using visualization/reporting tools
- Familiar wif importing and exporting data using Sqoop
- Experience in writing Mapreduce joins like Map-side joins using Distributed Cache API
- Experience in planning, designing and developing applications spanning full life cycle of software development from writing functional specification, designing, implementing, documentation, unit testing and support.
- Experience in working on production support environment supporting large applications involving complex issues, bug fixes, daily/monthly/yearly maintenance activities, batch job monitoring and troubleshooting.
- Excellent team player wif good communication, interpersonal and presentation skills.
TECHNICAL SKILLS
Languages: C++, Java, VB, Shell Scripting, IBM AS400(RPG, CL, Subfiles, Display Files), PL/SQL, ASP .Net
Hadoop Ecosystem: HDFS, Map-Reduce, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, HBase
Tools: & Technologies: Rational Rose, Eclipse/Spring source IDE, Microsoft Office Suite, Matlab, Xcode, Netbeans, Microsoft Visual Studio
Databases: Oracle, MySQL, SQL Server, IBM DB2/400
Operating Systems: Windows XP/7/8, Linux RedHat/Ubuntu/CentOS
Monitoring Tools: Cloudera Manager, Ganglia, Nagios, Ambari
Version Control: Git, Microsoft Visual SourceSafe, Subversion (SVN)
PROFESSIONAL EXPERIENCE
Confidential, Hartford, CT
Hadoop /Data Analyst
Responsibilities:
- Actively participated wif teh development team to meet teh specific customer requirements and proposed TEMPeffective Hadoop solutions.
- Installed and configured cluster, including setup of Namenode, Datanodes, Jobtracker and Tasktrackers.
- Worked closely wif teh complaint processing teams to determine teh severity of teh complaint.
- Collected history of customers who have registered a complaint.
- Text mining of teh transcripts of phone logs of teh customers reaching out to customer care using Map Reduce wif R.
- Aggregating all teh data of a customer who has filed a complaint wif Pig and Hive using MySQL.
- Exporting teh data to RDBMS using Sqoop.
- Identifying patterns in customer complaints.
- Reaching out to customers who have filed a complaint or have a high tendency to file a complaint and resolving teh issue wifin a week.
Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, VMware, Eclipse, Cloudera, Hortonworks.
Confidential, Hartford, CT
Hadoop Consultant
Responsibilities:
- Worked closely wif teh claims processing team to obtain patterns in filing of fraudulent claims.
- Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
- Developed Mapreduce programs to extract and transform teh data sets and results were exported back to RDBMS using Sqoop.
- Patterns were observed in fraudulent claims using text mining in R and Hive.
- Installed, Configured and managed Flume Infrastructure
- Was responsible for importing teh data (mostly log files) from various sources into HDFS using Flume
- Created tables in Hive and loaded teh structured (resulted from MapReduce jobs) data
- Using HiveQL developed many queries and extracted teh required information.
- Exported teh data required information to RDBMS using Sqoop to make teh data available for teh claims processing team to assist in processing a claim based on teh data.
Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, VMware, Eclipse, Cloudera
Confidential, CA
Data Analyst
Responsibilities:
- Work wif users to identify teh most appropriate source of record and profile teh data required for sales and service.
- Document teh complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Involved in defining teh business/transformation rules applied for sales and service data.
- Define teh list codes and code conversions between teh source systems and teh data mart.
- Worked wif internal architects and, assisting in teh development of current and target state data architectures
- Worked wif project team representatives to ensure that logical and physical ER/Studio data models were developed in line wif corporate standards and guidelines
- Involved in defining teh source to target data mappings, business rules, business and data definitions
- Responsible for defining teh key identifiers for each mapping/interface
- Responsible for defining teh functional requirement documents for each source to target interface.
- Document, clarify, and communicate requests for change requests wif teh requestor and coordinate wif teh development and testing team.
- Coordinated meetings wif vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Document data quality and traceability documents for each source interface
- Establish standards of procedures.
- Generate weekly and monthly asset inventory reports.
- Evaluated data profiling, cleansing, integration and extraction tools(e.g. Informatica)
- Coordinate wif teh business users in providing appropriate, TEMPeffective and efficient way to design teh new reporting needs based on teh user wif teh existing functionality
- Remain noledgeable in all areas of business operations in order to identify systems needs and requirements.
- Responsible for defining teh key identifiers for each mapping/interface
- Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
Environment: SQL/Server, Oracle 9i, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects
Confidential
Java Developer
Responsibilities:
- Used agile methodology in designing and developing teh modules.
- Collected User Stories for documenting teh requirements of product catalog, ordering products and Approval module.
- Used Struts validator framework to validate user input.
- Developed MVC design pattern based User Interface using JSP, XML, HTML and Struts.
- Used JSF framework in developing user interfaces using JSF UI components, Validator events and Listeners.
- Used Apache Axis to generate teh Order products web services module.
- Design and Implemented WSDL/SOAP Web Services to provide teh interface to teh various clients running on both Java and Non Java applications.
- Identifying and implementation of different J2EE design patterns like Service Locator, Business Delegate, and Dao etc.
- Used SOAP UI to test teh Web services.
- Application is built using standard design patterns such as DAO, Abstract Factory, Session Facade, Business Delegate, and MVC.
- Junit, log4j were used for unit testing and as logging frameworks.
- Hibernate is used as a persistence mapping technology by mapping and configuring teh POJO classes wif Data Base tables.
- Participated in and contributed to group sessions, design reviews, and code analyzing.
- Used svn repository for version control.
- Used Eclipse IDE for development.
Environment: Java, J2EE, Struts, Hibernate, JSP, HTML, WebSphere, Oracle 10g, Apache Ant, Log4J, RAD, Eclipse IDE, JUnit, Subversion, Axis, WSDL, Web Services.