Big Data/ Talend Developer Resume
Houston, TX
SUMMARY:
- Over 7+ years of IT experience in Analysis, Design, Development and in Scala, Spark,Hadoop and HDFS environment and experience in JAVA, J2EE.
- Experienced in developing and Implementing MapReduce programs using Hadoop to work as per the requirement.
- Excellent experience on Scala, ApacheSpark, SparkStreaming, Pattern Matching and Map - Reducing.
- Developed ETL test scripts based on technical specifications/Data design documents and source to target mappings.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
- Experienced in working with different data sources like Flat files, Spreadsheet files, log files and Databases.
- Experienced in working with flumeto load the log data from multiple sources directly into hdfs.
- Excellent experience in Apache Hadoop ecosystem components like Hadoop Distributing File System (HDFS), Map Reduce, Sqoop, Apache Spark and Scala.
- Extensive experience working inOracle, DB2, SQL Server and MySQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map-Reduce and Pig jobs.
- Experience with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
- Experience in managing and reviewing Hadoop Log files using FLUME and Kafka and also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
- Experience with NOSQL databases like HBASE and Cassandra
- Experience in scripting using UNIX Shell script.Proficiency in Linux (UNIX) and Windows OS.
- Experienced in setting up data gathering tools such as Flume and Sqoop.
- Extensive knowledge about Zookeeper process for various types of centralized configurations.
- Knowledge of monitoring and managing Hadoop cluster using Hortonworks.
- Experienced in working with Flume to load the log data from multiple sources directly into HDFS.
- Worked with application teams to install operating system, Hadoop updates, patches and version upgrades as required.
- Experienced in analyzing, designing and developing ETL strategies and processes, Writing ETL specifications.
- Experiences on applications using Java, python and UNIXshell scripting.
- Have good interpersonal skills, good communication, problem solving skills and a motivated team player.
- Have the ability to be a value contribution to the company.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, HBase, Oozie, Zookeeper, spark, storm& Kafka
Java & J2EE Technologies: Core Java
IDE’s: Eclipse, Net beans
Big data Analytics: Datameer 2.0.5
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C, C++, Java, Python, Ant scripts, Linux shell scripts
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP, FTP
ETL Tools: Informatica, Pentaho, SSRS, SSIS, BO, Crystal reports, Cognos.
Testing: Win Runner, Load Runner, QTP
WORK EXPERIENCE:
Confidential, Houston, TX
Big Data/ Talend Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Worked extensively with Flume for importing social media data
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
- Upgraded the Hadoop Cluster from CDH 3 to CDH 4, setting up High Availability Cluster and integrating HIVE with existing applications
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Developed Pigscripts in the areas where extensive coding needs to be reduced.
- Extensively used for all and bulk collect to fetch large volumes of data from table.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce, loaded data into HDFS
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS
- Hands-on experience with productionalizing Hadoop applications viz. administration, configuration management, monitoring, debugging and performance tuning
- Created HBase tables to store various data formats of PII data coming from different portfolios.Data processing using SPARK.
- Parsed high-level design specification to simple ETL coding and mapping standards.
- Cluster co-ordination services through Zookeeper .
- Developed complex Talend jobs mappings to load the data from various sources using different components.
- Design, develop and implement solutions using Talend Integration Suite.
- Partitioning data streams using KAFKA . Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second.
- Used Kafka producer 0.8.3 API's to produce messages
Environment: Hadoop (Cloudera), HDFS, MapReduce, Pig, Hive, Sqoop, HBase, Oozie, Flume, Zookeeper, java, SQL, Scripting, Spark, Kafka.
Confidential, Plano, TX
Big Data/ Hadoop Developer
Responsibilities:
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Responsible for developing map reduce program using text analytics and pattern matching algorithms
- Involved in in porting data from various client servers like Remedy Altiris Cherwell OTRS etc into HDFS file system
- Assist the development team to install single node Hadoop 224 in local machine
- Coding REST Web service and client to fetch tickets from client ticketing servers
- Facilitating Sprint planning Retrospection and closer meeting for each spring and help capture various metrics like team status
- Participated in architectural and design decisions with respective teams
- Developed in-memory data grid solution across conventional and cloud environments using Oracle Coherence.
- Work with customers to develop and support solutions that use our in-memory data grid product.
- Used Pig as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS
- Optimizing Map reduce code, pig scripts, user interface analysis, performance tuning and analysis.
- Analysis with data visualization player Tableau.
- Writing Pig scripts for data processing.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Loaded the aggregated data onto DB2 for reporting on the dashboard.
Environment: Big Data/Hadoop, JDK1.6, Linux, Python, Java, Agile, RESTful Web Services, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, NoSQL, HBase and Tableau.
Confidential, NYC
Hadoop Developer
Responsibilities:
- Developed Map-Reduce programs for data analysis and data cleaning.
- Installing and configuring Hortonworks Data Platform 2.1 - 2.3.
- Implemented Big Data solutions including data acquisition, storage, transformation and analysis.
- Wrote Map-Reduce jobs to discover trends in data usage by users.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Provided quick response to ad hoc internal and external client requests for data.
- Loaded and transformed large sets of structured and unstructured data using Hadoop.
- Developed Pig scripts in the areas where extensive coding needs to be reduced.
- Responsible for creating Hive tables, loading data and writing hive queries.
- Involved in loading data from Linux file system to HDFS.
- Created complex mappings in Talend 5.x.
- Created Talend Mappings to populate the data into Staging, Dimension and Fact tables.
- Excellent knowledge of NOSQL on Mongo and Cassandra DB.
- Handled importing data from various data sources, performed transformations using Hive and Map-Reduce, streamed using Flume and loaded data into HDFS.
- Installed Ozzie workflow engine to run multiple MapReduce, Hive, Impala, Zookeeper and Pig jobs which run independently with time and data availability.
- Worked with NoSQL database HBase to create tables and store data.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Written Hive queries for data analysis to meet the business requirement.
Environment: Hadoop, Pig, Hive, Oozie, NoSQL, Sqoop, Flume, Hdfs, Hbase, Map-Reduce, MySQL, Horton Works, Impala, Cassandra DB, Mongo, Zookeeper.
Confidential, Peoria, IL
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing
- Importing and exporting data into HDFS and Hive using Sqoop
- Used Multithreading, synchronization, caching and memory management
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle ( SDLC )
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management , backup, and disaster recovery systems and procedures.
- Responsible for creating Hive tables, loading data and writing hive queries.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Load and transform large sets of structured, semi structured and unstructured data
- Supported Map Reduce Programs those are running on the cluster
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Wrote complex Hive queries and UDFs in Java and Python .
- Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs
- Utilized Java and MySQL from day to day to debug and fix issues with client processes
- Managed and reviewed log files
- Implemented partitioning, dynamic partitions and buckets in HIVE
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, CouchDB, Python, Java, Flume, HTML, XML, SQL, MySQL J2EE, Eclipse
Confidential
Java Project
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax .
- Agile Scrum Methodology been followed for the development process.
- Developed proto-type test screens in HTML and JavaScript .
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Experience in writing PL/SQL stored procedures, Function, Triggers, Oracle reports and Complex SQL’s .
- Worked with JavaScript to perform client side form validations. Gave an innovative for logging for all interdepends application.
- Used Struts tag libraries as well as Struts tile framework.
- Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency. Created connection through JDBC and used JDBC statements to call stored procedures .
- Client side validation done using JavaScript .
- Used Data Access Object to make application more flexible to future and legacy databases.
- Actively involved in tuning SQL queries for better performance.
- Developed the application by using the Spring MVC framework .
- Collection framework used to transfer objects between the different layers of the application.
- Developed data mapping to create a communication bridge between various application interfaces using XML , and XSL .
- Proficient in developing applications having exposure to Java, JSP, UML, Oracle (SQL, PL/SQL), HTML, Junit, JavaScript, Servlets, Swing DB2, CSS .
- Spring IOC being used to inject the parameter values for the Dynamic parameters.
- Developed JUnit testing framework for Unit level testing.
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
- Successfully delivered all product deliverables that resulted with zero defects.
Environment: Spring MVC, Oracle (SQL, PL/SQL), J2EE, Java, struts, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008