Data Analyst And Etl Developer Resume
Schaumburg, IllinoiS
SUMMARY:
- More than nine years of experience as a Data Analyst and ETL Developer at Confidential ( Confidential Corp. ) and Confidential (On contract).
- Good knowledge of designing and developing data pipelines, working with large data sets, experience working with distributed computing, integration of data from multiple data sources/formats, migrating data from legacy mainframe and AS400 systems to modern data warehouses, data lakes.
- Proficiency in multiple big data technologies like Map - Reduce programming, Apache Spark, Kafka, Hive, Pig, Sqoop and Flume and NoSQL databases like HBase, MongoDB, Cassandra.
- Good knowledge of Hadoop ecosystem, HDFS, YARN, Apache Zookeeper, Hue, Ambari and implementing Lambda Architecture.
- Experienced in Cloud Computing using Google Cloud Platform (GCP), Dataproc, Docker containers, Kubernetes, Hortonworks and Cloudera platform.
- Practical knowledge of various Machine Learning (ML) libraries like Spark MLLib and ML algorithms like Collaborative filtering, Sentiment Analysis using Twitter APIs.
- Knowledge of ETL techniques and frameworks utilizing Flume and Sqoop to migrate data from relational databases and streaming sources to Hadoop.
- Practical experience of version control software like Subversive (SVN) and Git, Dependency management tools like Apache Maven, Build Management tools like Apache Ant, DevOps continuous integration (CI) tools like Jenkins/Hudson.
- Good knowledge of Java IDEs like Eclipse and IntelliJ Idea, Jupyter notebooks (apache Toree) for Spark-Scala application development and debugging.
- Experienced in both Agile and Waterfall model projects, fast paced and rapidly changing and challenging projects and collaborating with other developers and data scientists and application architects.
- Consistently ed one of the top contributors at Confidential in the past.
TECHNICAL SKILLS:
Big data technologies: Hadoop MapReduce using Java, HDFS, YARN, Zookeeper, Ambari, Hive, Pig, Spark, Kafka.
NoSQL databases: Cassandra, MongoDB, HBase.
IDEs: Eclipse, IntelliJ IDEA, Jupyter Notebooks (Apache Toree).
RDBMS: DB2, MySQL, Microsoft SQL.
Programming Languages: Core Java, Scala, SQL, Hive QL.
Operating Systems: Linux, Windows, Z/OS, AS/400 (i-Series).
Team Collaboration/SDLC tools: Confidential s Rational Team Concert (RTC), Rational Quality Manager (RQM), Atlassian Jira, Confluence and MS SharePoint.
Microsoft Office: MS Excel, MS Word, PowerPoint, Microsoft Project.
Version controlling/Build management/CI: Subversive-SVN, SourceTree, Git, Ant, Jenkins / Hudson.
PROFESSIONAL EXPERIENCE:
Confidential
Responsibilities:
- I Worked for Confidential as a mainframe developer who builds data processing jobs using COBOL and other mainframe technologies for 4 years in the past. Later on, I worked as a big data developer for 5 years, responsible for developing large scale data processing frameworks using Hadoop Map-Reduce Java programs, Pig and Hive. More recently, I worked on real-time streaming and batch data processing APIs like Apache spark RDD/DataFrame/Dataset, Spark Streaming, Spark SQL, Kafka integration with Spark, Kafka Twitter Integration using Scala and NoSQL databases like Cassandra and MongoDB.
Confidential, Schaumburg, Illinois
Responsibilities:
- Worked as a Data Analyst in-charge of managing the Enterprise Data lake built on Hadoop Clusters. The incoming claims data from various sources/systems were stored in Hadoop for reporting, visualization, analytics and machine learning. I used technologies such as Map-Reduce programming, Apache Pig and Hive to process the data and provide it to downstream applications or to Business Intelligence (BI) or analytics teams.
Confidential, Hartford, Connecticut
Responsibilities:
- As a Spark Scala Developer, I was responsible for creating real time and batch data processing systems that monitor claims activity, detect and flag fraudulent claims and create pricing tables for business teams. My team developed an application using Apache Spark and Kafka that processes gigabytes of data every day to calculate appropriate premium for insurance policies for our property and casualty insurance client, based on factors such as the type of business, geographical area, and historical income/losses of the industry. The application utilized data from data warehouse and real time data from third party REST API sources to achieve this.
Confidential
Responsibilities:
- Currently working for Confidential as a Sr. Data analytics developer in Downtown, Toronto. My team is building an application that utilizes Machine learning and AI to build a smart spending app that lets customers analyze their spending habits over several years, make recommendations to save money and to switch to different types of products based on their spending patterns. My role in the team is to extract the customer data from the data warehouse, join data from multiple sources, categorize transactions, aggregate it and provide the data to the ML program in a custom format in real-time.